1: \begin{abstract}
2: We obtain an improved finite-sample guarantee on the linear
3: convergence of stochastic gradient descent for smooth and strongly
4: convex objectives, improving from a quadratic dependence on the
5: conditioning $(L/\mu)^2$ (where $L$ is a bound on the smoothness and
6: $\mu$ on the strong convexity) to a linear dependence on $L/\mu$.
7: Furthermore, we show how reweighting the sampling distribution
8: (i.e.~importance sampling) is necessary in order to further improve
9: convergence, and obtain a linear dependence in the average
10: smoothness, dominating previous results. We also discuss importance
11: sampling for SGD more broadly and show how it can improve
12: convergence also in other scenarios.
13:
14: Our results are based on a connection we make between SGD and the
15: \emph{randomized Kaczmarz algorithm}, which allows us to transfer
16: ideas between the separate bodies of literature studying each of the
17: two methods. In particular, we recast the
18: randomized Kaczmarz algorithm as an instance of SGD, and apply our results to prove its exponential convergence, but to the solution of a weighted least squares problem rather than the original least squares problem. We then present a modified Kaczmarz algorithm with partially biased sampling which does converge to the original least squares solution with the same exponential convergence rate.
19:
20: \smallskip
21: \noindent \textbf{Keywords.} distribution
22: reweighting, importance sampling, Kaczmarz method, stochastic gradient descent
23:
24: \end{abstract}
25: