abstract:fc17fdf9d4db3006.tex

1: \begin{abstract}

2: 	%{[Sijia's edits]}

3: 	\vspace*{-0.1in}

4: 	This paper studies a class of adaptive gradient based momentum algorithms that update the  search directions and learning rates simultaneously using past gradients. This class, which we refer to as the ``Adam-type'', includes the popular algorithms such as Adam \citep{kingma2014adam}

5: 	, AMSGrad \citep{reddi2018convergence} , {AdaGrad} \citep{duchi2011adaptive}.

6: 	Despite their popularity in training deep neural networks (DNNs), the convergence of these algorithms for solving  non-convex problems remains an \textit{open} question.

7:

8: 	In this paper, we develop an analysis framework and a set of mild sufficient conditions that guarantee the convergence of the Adam-type methods, with a convergence rate of order %$O(1/\sqrt{T})$

9: 	{ $O(\log{T}/\sqrt{T})$}

10: 	for non-convex stochastic optimization.

11: 	Our convergence analysis applies to a new algorithm called AdaFom (AdaGrad with First Order Momentum).

12: 	{We show that the conditions are essential, by identifying concrete examples in which violating the conditions makes an algorithm diverge.}

13: 	Besides providing one of the first comprehensive analysis for Adam-type methods in the non-convex setting, our results can also help the practitioners to easily  monitor the progress of algorithms and determine their convergence behavior.

14: 	%Further, they  serve as the basis upon which the theorists can sharpen the rates, or derive more relaxed conditions.

15:

16: % 	\textcolor{Sijia_color}{Experiments on both synthetic and real datasets are provided   to support   our  convergence results on  Adam-type algorithms.}

17: 	%Our study  could also be extended to a broader class of adaptive gradient methods in machine learning and optimization.

18: 		\vspace*{-0.1in}

19: \end{abstract}

20: