1b71d7d6b31c2e13.tex
1: \begin{abstract}
2: %\avg{w_*}{w_*}
3: Heavy Ball (HB) nowadays is one of the most popular momentum methods in non-convex 
4: optimization. It has been widely observed that incorporating the Heavy Ball dynamic in gradient-based methods accelerates the training process of modern machine learning models. However, the progress on establishing its theoretical foundation of acceleration is apparently far behind its empirical success. 
5: Existing provable acceleration results are of the quadratic or close-to-quadratic functions, as the current techniques of showing HB's acceleration are limited to the case when the Hessian is fixed. In this work, we develop some new techniques that help show acceleration beyond quadratics, which is achieved by analyzing how the change of the Hessian at two consecutive time points affects the convergence speed. Based on our technical results, a class of Polyak-\L{}ojasiewicz (PL) optimization problems for which provable acceleration can be achieved via HB is identified. %, when the non-convexity is averaged-out. Some concrete examples are provided in this paper.
6: %The examples include training a diagonal network which has drawn growing attention in recent years.
7: Moreover, our analysis demonstrates a benefit of adaptively setting the momentum parameter.
8: % for each dimension.
9: %To our knowledge, our result is the first provable acceleration of momentum methods under PL in the discrete time.  
10: \end{abstract}
11: