d381527ab6522543.tex
1: \begin{abstract}
2:     The stochastic proximal gradient method is a powerful generalization of the widely used stochastic gradient descent (\algname{SGD}) method and has found numerous applications in Machine Learning. 
3:     However, it is notoriously known that this method fails to converge in non-convex settings where the stochastic noise is significant (i.e.\ when only small or bounded batch sizes are used).
4:     In this paper, we focus on the stochastic proximal gradient method with Polyak momentum.
5:     We prove this method attains an optimal convergence rate for non-convex composite optimization problems, regardless of batch size.
6:     Additionally, we rigorously analyze the variance reduction effect of the Polyak momentum in the composite optimization setting
7:     and we show the method also converges when the proximal step can only be solved inexactly. Finally, we provide numerical experiments to validate our theoretical results.
8: \end{abstract}
9: