abstract:cda3d9bbcb830e5f.tex

1: \begin{abstract}

2: We consider non-convex stochastic optimization using first-order algorithms for which the gradient estimates may have heavy tails. We show that a combination of gradient clipping, momentum, and normalized gradient descent can easily obtain convergence in high-probability with best-known rates for all tail-indices both smooth and second-order smooth objectives. In the latter case, we provide the first such results even in expectation for tail-index less than 2.

3: \end{abstract}

4: