a25f4f6e9adaefa8.tex
1: \begin{abstract} 
2: In this note we give a simple proof for the convergence of stochastic gradient (SGD) methods on $\mu$-convex functions under a (milder than standard) $L$-smoothness assumption. 
3: We show that for carefully chosen stepsizes SGD
4: converges after $T$ iterations as $\cO\left( L \norm{\xx_0-\xx^\star}^2 \exp \bigl[-\frac{\mu}{4L}T\bigr] + \frac{\sigma^2}{\mu T} \right)$ where $\sigma^2$ measures the variance in the stochastic noise. For deterministic gradient descent (GD) and SGD in the interpolation setting we have $\sigma^2 =0$ and we recover the exponential convergence rate. The bound matches with the best known iteration complexity of GD and SGD, up to constants.
5: \end{abstract}