2603dfc0fe853031.tex
1: \begin{abstract}
2: We propose a new stochastic gradient method called \MOTAPS (Moving Targetted Polyak Stepsize) that uses recorded past loss values to compute adaptive stepsizes. \MOTAPS can be seen as a variant of the Stochastic Polyak~(\SP) which is also a method that also uses loss values to adjust the stepsize. The downside to the \SP method is that it only converges when the interpolation condition holds. \MOTAPS is an extension of \SP that does not rely on the interpolation condition.  The \MOTAPS method uses $n$ auxiliary variables, one for each data point, that track the loss value for each data point.    We provide a global convergence theory for \SP, an intermediary method \TAPS, and \MOTAPS  by showing that they all can be interpreted as a special variant of online SGD. We also perform several numerical experiments on convex learning problems, and deep learning models for image classification and language translation. In all of our tasks we show that  \MOTAPS is competitive with the relevant baseline method.
3:   
4:   %Our starting point to develop \MOTAPS  is to note that the \SP method is a (subsampled) Newton-Raphson method applied to solving certain \emph{interpolation equations}. The solution to these interpolation equations is the optimal point so long as the interpolation assumption holds.  By introducing $n$ auxiliary variables (one for each data point), we form new optimality equations 
5: %
6: %We then form new optimality equations similar
7: %
8: %We then use this viewpoint to develop a new variant of the \SP method that converges without interpolation called \MOTAPS.
9: % To derive our new method, we first show that  it can be interpreted as a subsampled Newton-Raphson method applied to certain optimality equations that hold under interpolation. 
10: % We then drop the interpolation assumption and show that similar 
11: % 
12: %  by introducing new auxiliary variables we derive new optimality equations tha
13: \end{abstract}
14: