abstract:a25f4f6e9adaefa8.tex

1: \begin{abstract}

2: In this note we give a simple proof for the convergence of stochastic gradient (SGD) methods on $\mu$-convex functions under a (milder than standard) $L$-smoothness assumption.

3: We show that for carefully chosen stepsizes SGD

4: converges after $T$ iterations as $\cO\left( L \norm{\xx_0-\xx^\star}^2 \exp \bigl[-\frac{\mu}{4L}T\bigr] + \frac{\sigma^2}{\mu T} \right)$ where $\sigma^2$ measures the variance in the stochastic noise. For deterministic gradient descent (GD) and SGD in the interpolation setting we have $\sigma^2 =0$ and we recover the exponential convergence rate. The bound matches with the best known iteration complexity of GD and SGD, up to constants.

5: \end{abstract}