abstract:d46e012cb4dbde5b.tex

1: \begin{abstract}

2: We consider stochastic gradient descent algorithms for minimizing a non-smooth, strongly-convex function.

3: Several forms of this algorithm, including suffix averaging, are known to achieve the optimal $O(1/T)$ convergence rate \emph{in expectation}.

4: We consider a simple, non-uniform averaging strategy of Lacoste-Julien et al. (2011) and prove that it achieves the optimal $O(1/T)$ convergence rate \emph{with high probability}. Our proof uses a recently developed generalization of Freedman's inequality.

5: %Thus we have a tight, high-probability analysis of a very simple output strategy for SGD which attains the optimal rate.

6: Finally, we compare several of these algorithms experimentally and show that this non-uniform averaging strategy outperforms many standard techniques, and with smaller variance.

7: \end{abstract}

8: