1: \begin{abstract}
2: Although \adam\ is a very popular algorithm for optimizing the weights of neural networks,
3: it has been recently shown that it can diverge even in simple convex optimization examples.
4: Several variants of \adam\ have been proposed to circumvent this
5: convergence issue.
6: In this work, we study the \adam\ algorithm for smooth nonconvex optimization under
7: a boundedness assumption on the adaptive learning rate.
8: The bound on the adaptive step size depends on the Lipschitz constant of the
9: gradient of the objective function and provides safe theoretical adaptive
10: step sizes.
11: Under this boundedness assumption, we show a novel first order convergence rate result in both deterministic
12: and stochastic contexts. Furthermore, we establish convergence rates of the function value sequence
13: using the Kurdyka-\L{}ojasiewicz property.\\
14: % which is satisfied for most deep neural networks.
15: \end{abstract}
16: