d661c9442c8367e5.tex
1: \begin{abstract}: We consider the minimization of an objective
2:   function given access to unbiased estimates of its gradient through
3:   stochastic gradient descent (SGD) with constant step-size. While the
4:   detailed analysis was only performed for quadratic functions, we
5:   provide an explicit asymptotic expansion of the moments of the
6:   averaged SGD iterates that outlines the dependence on initial
7:   conditions, the effect of noise and the step-size, as well as the
8:   lack of convergence in the general (non-quadratic) case. For this
9:   analysis, we bring tools from Markov chain theory into the analysis
10:   of stochastic gradient.  We then show that Richardson-Romberg
11:   extrapolation may be used to get closer to the global optimum and we
12:   show empirical improvements of the new extrapolation scheme.
13: \end{abstract}
14: