1: \begin{abstract}
2: %{[Sijia's edits]}
3: \vspace*{-0.1in}
4: This paper studies a class of adaptive gradient based momentum algorithms that update the search directions and learning rates simultaneously using past gradients. This class, which we refer to as the ``Adam-type'', includes the popular algorithms such as Adam \citep{kingma2014adam}
5: , AMSGrad \citep{reddi2018convergence} , {AdaGrad} \citep{duchi2011adaptive}.
6: Despite their popularity in training deep neural networks (DNNs), the convergence of these algorithms for solving non-convex problems remains an \textit{open} question.
7:
8: In this paper, we develop an analysis framework and a set of mild sufficient conditions that guarantee the convergence of the Adam-type methods, with a convergence rate of order %$O(1/\sqrt{T})$
9: { $O(\log{T}/\sqrt{T})$}
10: for non-convex stochastic optimization.
11: Our convergence analysis applies to a new algorithm called AdaFom (AdaGrad with First Order Momentum).
12: {We show that the conditions are essential, by identifying concrete examples in which violating the conditions makes an algorithm diverge.}
13: Besides providing one of the first comprehensive analysis for Adam-type methods in the non-convex setting, our results can also help the practitioners to easily monitor the progress of algorithms and determine their convergence behavior.
14: %Further, they serve as the basis upon which the theorists can sharpen the rates, or derive more relaxed conditions.
15:
16: % \textcolor{Sijia_color}{Experiments on both synthetic and real datasets are provided to support our convergence results on Adam-type algorithms.}
17: %Our study could also be extended to a broader class of adaptive gradient methods in machine learning and optimization.
18: \vspace*{-0.1in}
19: \end{abstract}
20: