abstract:68bdc8109898a77f.tex

1: \begin{abstract}

2: In this paper, we introduce \textsc{Apollo}, a quasi-Newton method for nonconvex stochastic optimization, which dynamically incorporates the curvature of the loss function by approximating the Hessian via a diagonal matrix.

3: Importantly, the update and storage of the diagonal approximation of Hessian is as efficient as adaptive first-order optimization methods with linear complexity for both time and memory.

4: To handle nonconvexity, we replace the Hessian with its rectified absolute value, which is guaranteed to be positive-definite.

5: Experiments on three tasks of vision and language show that \textsc{Apollo} achieves significant improvements over other stochastic optimization methods, including SGD and variants of Adam, in terms of both convergence speed and generalization performance.

6: The implementation of the algorithm is available at \url{https://github.com/XuezheMax/apollo}.

7: \end{abstract}

8: