abstract:1098a6c435d6f55b.tex

1: \begin{abstract}

2: We study the application of variance reduction (VR) techniques to

3: general non-convex stochastic optimization problems. In this setting,

4: the recent work STORM \cite{cutkosky2019momentum} overcomes the

5: drawback of having to compute gradients of ``mega-batches'' that

6: earlier VR methods rely on. There, STORM utilizes recursive momentum

7: to achieve the VR effect and is then later made fully adaptive in

8: STORM+ \cite{levy2021storm+}, where full-adaptivity removes the

9: requirement for obtaining certain problem-specific parameters such

10: as the smoothness of the objective and bounds on the variance and

11: norm of the stochastic gradients in order to set the step size. However,

12: STORM+ crucially relies on the assumption that the function values

13: are bounded, excluding a large class of useful functions. In this

14: work, we propose $\algnamenew$, a generalized framework of STORM+

15: that removes this bounded function values assumption while still attaining

16: the optimal convergence rate for non-convex optimization. $\algnamenew$

17: not only maintains full-adaptivity, removing the need to obtain problem

18: specific parameters, but also improves the convergence rate's dependency

19: on the problem parameters. Furthermore, $\algnamenew$ can utilize

20: a large range of parameter settings that subsumes previous methods

21: allowing for more flexibility in a wider range of settings. Finally,

22: we demonstrate the effectiveness of META-STORM through experiments

23: across common deep learning tasks. Our algorithm improves upon the

24: previous work STORM+ and is competitive with widely used algorithms

25: after the addition of per-coordinate update and exponential moving

26: average heuristics.

27: \end{abstract}

28: