abstract:7038bedc469fd329.tex

1: \begin{abstract}

2: We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods.

3: We exploit these continuous-time models, together with simple Lyapunov analysis as well as tools from stochastic calculus, in order to derive convergence bounds for various types of non-convex functions. Guided by such analysis, we show that the same Lyapunov arguments hold in discrete-time, leading to matching rates. In addition, we use these models and It\^{o} calculus to infer novel insights on the dynamics of SGD, proving that a decreasing learning rate acts as time warping or, equivalently, as landscape stretching.

4: % We contrast these bounds to their known equivalent in discrete-time as well as derive new bounds, both in continuous and discrete time. Our model also includes SVRG, for which we derive a linear convergence rate for the class of restricted secant functions.

5: \end{abstract}

6: