396934160d39350c.tex
1: \begin{abstract}
2:   We obtain an improved finite-sample guarantee on the linear
3:   convergence of stochastic gradient descent for smooth and strongly
4:   convex objectives, improving from a quadratic dependence on the
5:   conditioning $(L/\mu)^2$ (where $L$ is a bound on the smoothness and
6:   $\mu$ on the strong convexity) to a linear dependence on $L/\mu$.
7:   Furthermore, we show how reweighting the sampling distribution
8:   (i.e.~importance sampling) is necessary in order to further improve
9:   convergence, and obtain a linear dependence in the average
10:   smoothness, dominating previous results.  We also discuss importance
11:   sampling for SGD more broadly and show how it can improve
12:   convergence also in other scenarios.
13: 
14:   Our results are based on a connection we make between SGD and the
15:   \emph{randomized Kaczmarz algorithm}, which allows us to transfer
16:   ideas between the separate bodies of literature studying each of the
17:   two methods.    In particular, we recast the 
18:  randomized Kaczmarz algorithm as an instance of SGD, and apply our results to prove its exponential convergence, but to the solution of a weighted least squares problem rather than the original least squares problem.  We then present a modified Kaczmarz algorithm with partially biased sampling which does converge to the original least squares solution with the same exponential convergence rate. 
19:   
20:   \smallskip
21: \noindent \textbf{Keywords.} distribution
22: reweighting, importance sampling,  Kaczmarz method, stochastic gradient descent
23: 
24: \end{abstract}
25: