fc17fdf9d4db3006.tex
1: \begin{abstract}
2: 	%{[Sijia's edits]}
3: 	\vspace*{-0.1in}
4: 	This paper studies a class of adaptive gradient based momentum algorithms that update the  search directions and learning rates simultaneously using past gradients. This class, which we refer to as the ``Adam-type'', includes the popular algorithms such as Adam \citep{kingma2014adam}
5: 	, AMSGrad \citep{reddi2018convergence} , {AdaGrad} \citep{duchi2011adaptive}.
6: 	Despite their popularity in training deep neural networks (DNNs), the convergence of these algorithms for solving  non-convex problems remains an \textit{open} question. 
7: 	
8: 	In this paper, we develop an analysis framework and a set of mild sufficient conditions that guarantee the convergence of the Adam-type methods, with a convergence rate of order %$O(1/\sqrt{T})$ 
9: 	{ $O(\log{T}/\sqrt{T})$} 
10: 	for non-convex stochastic optimization.
11: 	Our convergence analysis applies to a new algorithm called AdaFom (AdaGrad with First Order Momentum).
12: 	{We show that the conditions are essential, by identifying concrete examples in which violating the conditions makes an algorithm diverge.} 
13: 	Besides providing one of the first comprehensive analysis for Adam-type methods in the non-convex setting, our results can also help the practitioners to easily  monitor the progress of algorithms and determine their convergence behavior. 
14: 	%Further, they  serve as the basis upon which the theorists can sharpen the rates, or derive more relaxed conditions. 
15: 	
16: % 	\textcolor{Sijia_color}{Experiments on both synthetic and real datasets are provided   to support   our  convergence results on  Adam-type algorithms.}
17: 	%Our study  could also be extended to a broader class of adaptive gradient methods in machine learning and optimization.
18: 		\vspace*{-0.1in}
19: \end{abstract}
20: