abstract:76d2429f60890306.tex

1: \begin{abstract}

2: Nesterov's accelerated gradient descent (\nag), an instance of the general family of ``momentum methods,'' provably achieves faster convergence rate than gradient descent (\gd) in the convex setting. However, whether these methods are superior to~\gd~in the nonconvex setting remains open. This paper studies a simple variant of~\nag, and shows that it escapes saddle points and finds a second-order stationary point in $\tilde{O}(1/\epsilon^{7/4})$ iterations, faster than the $\tilde{O}(1/\epsilon^{2})$ iterations required by~\gd. To the best of our knowledge, this is the first Hessian-free algorithm to find a second-order stationary point faster than~\gd, and also the first single-loop algorithm with a faster rate than~\gd~even in the setting of finding a first-order stationary point. Our analysis is based on two key ideas: (1) the use of a simple Hamiltonian function, inspired by a continuous-time perspective, which~\nag~monotonically decreases per step even for nonconvex functions, and (2) a novel framework called~\emph{\iol}, which is useful for tracking the long-term behavior of gradient-based optimization algorithms. We believe that these techniques may deepen our understanding of both acceleration algorithms and nonconvex optimization.

3: %  key techinics relies on keep track of energy function of AGD which comes from ODE perspective, and we believe it may be of interests in general nonconvex optimization community.

4: % Linking movement with progress, locality.

5: \end{abstract}

6: