abstract:aaadc9ce50b63f73.tex

1: \begin{abstract}

2:   Current state-of-the-art analyses on the convergence of gradient descent for training neural networks focus on characterizing properties of the loss landscape, such as the Polyak-\L ojaciewicz (PL) condition and the restricted strong convexity.

3:   While gradient descent converges linearly under such conditions, it remains an open question whether Nesterov's momentum enjoys accelerated convergence under similar settings and assumptions.

4:   In this work, we consider a new class of objective functions, where only a subset of the parameters satisfies strong convexity, and show Nesterov's momentum achieves acceleration in theory for this objective class.

5:   We provide two realizations of the problem class, one of which is deep ReLU networks, which --to the best of our knowledge--constitutes this work the first that proves accelerated convergence rate for non-trivial neural network architectures.

6: \end{abstract}

7: