1: \begin{abstract}: We consider the minimization of an objective
2: function given access to unbiased estimates of its gradient through
3: stochastic gradient descent (SGD) with constant step-size. While the
4: detailed analysis was only performed for quadratic functions, we
5: provide an explicit asymptotic expansion of the moments of the
6: averaged SGD iterates that outlines the dependence on initial
7: conditions, the effect of noise and the step-size, as well as the
8: lack of convergence in the general (non-quadratic) case. For this
9: analysis, we bring tools from Markov chain theory into the analysis
10: of stochastic gradient. We then show that Richardson-Romberg
11: extrapolation may be used to get closer to the global optimum and we
12: show empirical improvements of the new extrapolation scheme.
13: \end{abstract}
14: