abstract:110467e5009f2b3b.tex

1: \begin{abstract}

2: In this work, we study the convergence \emph{in high probability}

3: of clipped gradient methods when the noise distribution has heavy

4: tails, ie., with bounded $p$th moments, for some $1<p\le2$. Prior

5: works in this setting follow the same recipe of using concentration

6: inequalities and an inductive argument with union bound to bound the

7: iterates across all iterations. This method results in an increase

8: in the failure probability by a factor of $T$, where $T$ is the

9: number of iterations. We instead propose a new analysis approach based

10: on bounding the moment generating function of a well chosen supermartingale

11: sequence. We improve the dependency on $T$ in the convergence guarantee

12: for a wide range of algorithms with clipped gradients, including stochastic

13: (accelerated) mirror descent for convex objectives and stochastic

14: gradient descent for nonconvex objectives. This approach naturally

15: allows the algorithms to use time-varying step sizes and clipping

16: parameters when the time horizon is unknown, which appears impossible

17: in prior works. We show that in the case of clipped stochastic mirror

18: descent, problem constants, including the initial distance to the

19: optimum, are not required when setting step sizes and clipping parameters.

20: \end{abstract}

21: