abstract:d381527ab6522543.tex

1: \begin{abstract}

2:     The stochastic proximal gradient method is a powerful generalization of the widely used stochastic gradient descent (\algname{SGD}) method and has found numerous applications in Machine Learning.

3:     However, it is notoriously known that this method fails to converge in non-convex settings where the stochastic noise is significant (i.e.\ when only small or bounded batch sizes are used).

4:     In this paper, we focus on the stochastic proximal gradient method with Polyak momentum.

5:     We prove this method attains an optimal convergence rate for non-convex composite optimization problems, regardless of batch size.

6:     Additionally, we rigorously analyze the variance reduction effect of the Polyak momentum in the composite optimization setting

7:     and we show the method also converges when the proximal step can only be solved inexactly. Finally, we provide numerical experiments to validate our theoretical results.

8: \end{abstract}

9: