cda3d9bbcb830e5f.tex
1: \begin{abstract}
2: We consider non-convex stochastic optimization using first-order algorithms for which the gradient estimates may have heavy tails. We show that a combination of gradient clipping, momentum, and normalized gradient descent can easily obtain convergence in high-probability with best-known rates for all tail-indices both smooth and second-order smooth objectives. In the latter case, we provide the first such results even in expectation for tail-index less than 2.
3: \end{abstract}
4: