690bb3aad111bdfa.tex
1: \begin{abstract}
2:   Although \adam\ is a very popular algorithm for optimizing the weights of neural networks,
3:   it has been recently shown that it can diverge even in simple convex optimization examples.
4:   Several variants of \adam\ have been proposed to circumvent this
5:   convergence issue.
6:   In this work, we study the \adam\ algorithm for smooth nonconvex optimization under
7:   a boundedness assumption on the adaptive learning rate.
8:   The bound on the adaptive step size depends on the Lipschitz constant of the
9:   gradient of the objective function and provides safe theoretical adaptive
10:   step sizes.
11:   Under this boundedness assumption, we show a novel first order convergence rate result in both deterministic
12:   and stochastic contexts. Furthermore, we establish convergence rates of the function value sequence
13:   using the Kurdyka-\L{}ojasiewicz property.\\
14:   % which is satisfied for most deep neural networks.
15: \end{abstract}
16: