662ee114f07671fd.tex
1: \begin{abstract}%   <- trailing '%' for backward compatibility of .sty file
2: 		The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods {\color{black}in the constant step-size and slow adaptation regime}. The results establish that momentum methods are equivalent to
3: 		the standard stochastic gradient method with a re-scaled (larger) step-size value.
4: 		The size of the re-scaling is determined by the value of the momentum parameter. The equivalence result is established for all time instants and not only in steady-state. The analysis is carried out for general {\color{black}strongly convex and smooth} risk functions, and is not limited to quadratic risks. One notable conclusion is that the well-known benefits of momentum constructions for deterministic optimization problems do not necessarily carry over to {\color{black}the adaptive online setting when small constant step-sizes are used to enable continuous adaptation and learning in the presence of persistent gradient noise.} {\color{black}From simulations, the equivalence between momentum and standard stochastic gradient methods is also observed for non-differentiable and non-convex problems.}
5: %		{\color{red}The analysis also suggests a method to retain some of the advantages of the momentum construction by employing a decaying momentum parameter, as opposed to a decaying step-size. In this way, the enhanced convergence rate during the initial stages of adaptation is preserved without the often-observed degradation in MSD performance.}
6: 	\end{abstract}
7: