abstract:f9cff7ed83c50941.tex

1: \begin{abstract}

2: While the convergence behaviors of stochastic gradient methods are

3: well understood \emph{in expectation}, there still exist many gaps

4: in the understanding of their convergence with \emph{high probability},

5: where the convergence rate has a logarithmic dependency on the desired

6: success probability parameter. In the \emph{heavy-tailed

7: 	noise} setting, where the stochastic gradient noise only has bounded

8: $p$-th moments for some $p\in(1,2]$, existing works could only show

9: bounds \emph{in expectation} for a variant of stochastic gradient

10: descent (SGD) with clipped gradients, or high probability bounds in

11: special cases (such as $p=2$) or with extra assumptions (such as

12: the stochastic gradients having bounded non-central moments). In this

13: work, using a novel analysis framework, we present new and time-optimal

14: (up to logarithmic factors) \emph{high probability} convergence bounds

15: for SGD with clipping under heavy-tailed noise for both convex and

16: non-convex smooth objectives using only minimal assumptions.

17: \end{abstract}

18: