b1cf58cc1c394c00.tex
1: \begin{abstract}
2:   This paper proposes a thorough theoretical analysis of Stochastic Gradient
3:   Descent (SGD) with non-increasing step sizes.  First, we show that the
4:   recursion defining SGD can be provably approximated by solutions of a time
5:   inhomogeneous Stochastic Differential Equation (SDE) using an appropriate
6:   coupling. In the specific case of a batch noise we refine our results using
7:   recent advances in Stein's method. Then, motivated by recent analyses of
8:   deterministic and stochastic optimization methods by their continuous
9:   counterpart, we study the long-time behavior of the continuous processes at
10:   hand and establish non-asymptotic bounds. To that purpose, we develop new
11:   comparison techniques which are of independent interest. Adapting these
12:   techniques to the discrete setting, we show that the same results hold for the
13:   corresponding SGD sequences.  In our analysis, we notably improve
14:   non-asymptotic bounds in the convex setting for SGD under weaker assumptions
15:   than the ones considered in previous works. Finally, we also establish
16:   finite-time convergence results under various conditions, including
17:   relaxations of the famous \L ojasiewicz inequality, which can be applied to a
18:   class of non-convex functions.
19: \end{abstract}
20: