abstract:6cc1ea71e69ad523.tex

1: \begin{abstract}%   <- trailing '%' for backward compatibility of .sty file

2:

3: Many popular learning-rate schedules for deep neural networks combine a decaying trend with local perturbations that attempt to escape saddle points and bad local minima. We derive convergence guarantees for bandwidth-based step-sizes, a general class of learning-rates that are allowed to vary in a banded region. This framework includes many popular cyclic and non-monotonic step-sizes for which no theoretical guarantees were previously known. We provide worst-case guarantees for SGD on smooth non-convex problems under several bandwidth-based step sizes, including stagewise $1/\sqrt{t}$ and the popular \emph{step-decay} (``constant and then drop by a constant’’), which is also shown to be optimal. Moreover, we show that its momentum variant  converges as fast as SGD with the bandwidth-based step-decay step-size. Finally, we propose novel step-size schemes in the bandwidth-based family and verify their efficiency on several deep neural network training tasks.

4:

5:

6: % \iff

7: % It is known that step-size is the most important hyper-parameter in machine learning regime, especially for deep neural networks. This paper investigates the bandwidth-based policy which allows the step-size varies in a banded region, hence has the potential benefits for nonconvex optimization. We provide a worst-case theoretical guarantees for SGD on smooth nonconvex problems, under bandwidth step-size, e.g., $1/\sqrt{t}$  in a stagewise manner and the popular \emph{step-decay} (constant and then drop by a constant), which is also optimal. Moreover, we show that its momentum variant (SGDM) converges as fast as SGD with the bandwidth step-decay step-size. The analysis also provides theoretical guarantees for the cyclical step-sizes which lies within the band. Finally, we propose some bandwidth step-sizes and verifies their efficiency on several deep neural network tasks.

8: % \iffalse

9:

10:

11:

12:

13:

14:

15: \end{abstract}

16: