abstract:396934160d39350c.tex

1: \begin{abstract}

2:   We obtain an improved finite-sample guarantee on the linear

3:   convergence of stochastic gradient descent for smooth and strongly

4:   convex objectives, improving from a quadratic dependence on the

5:   conditioning $(L/\mu)^2$ (where $L$ is a bound on the smoothness and

6:   $\mu$ on the strong convexity) to a linear dependence on $L/\mu$.

7:   Furthermore, we show how reweighting the sampling distribution

8:   (i.e.~importance sampling) is necessary in order to further improve

9:   convergence, and obtain a linear dependence in the average

10:   smoothness, dominating previous results.  We also discuss importance

11:   sampling for SGD more broadly and show how it can improve

12:   convergence also in other scenarios.

13:

14:   Our results are based on a connection we make between SGD and the

15:   \emph{randomized Kaczmarz algorithm}, which allows us to transfer

16:   ideas between the separate bodies of literature studying each of the

17:   two methods.    In particular, we recast the

18:  randomized Kaczmarz algorithm as an instance of SGD, and apply our results to prove its exponential convergence, but to the solution of a weighted least squares problem rather than the original least squares problem.  We then present a modified Kaczmarz algorithm with partially biased sampling which does converge to the original least squares solution with the same exponential convergence rate.

19:

20:   \smallskip

21: \noindent \textbf{Keywords.} distribution

22: reweighting, importance sampling,  Kaczmarz method, stochastic gradient descent

23:

24: \end{abstract}

25: