cfad6262d24e21eb.tex
1: \begin{abstract}
2: Normalized gradient descent has shown substantial success in speeding up the convergence of  exponentially-tailed loss functions (which includes exponential and logistic losses) on linear classifiers with separable data.  In this paper, we go beyond linear models by studying normalized GD on two-layer neural nets. We prove for exponentially-tailed losses that using normalized GD leads to linear rate of convergence of the training loss to the global optimum. This is made possible by showing certain gradient self-boundedness conditions and a log-Lipschitzness property. We also study generalization of normalized GD for convex objectives via an algorithmic-stability analysis. In particular, we show that normalized GD does not overfit during training by establishing finite-time generalization bounds. 
3: \end{abstract}
4: