abstract:690bb3aad111bdfa.tex

1: \begin{abstract}

2:   Although \adam\ is a very popular algorithm for optimizing the weights of neural networks,

3:   it has been recently shown that it can diverge even in simple convex optimization examples.

4:   Several variants of \adam\ have been proposed to circumvent this

5:   convergence issue.

6:   In this work, we study the \adam\ algorithm for smooth nonconvex optimization under

7:   a boundedness assumption on the adaptive learning rate.

8:   The bound on the adaptive step size depends on the Lipschitz constant of the

9:   gradient of the objective function and provides safe theoretical adaptive

10:   step sizes.

11:   Under this boundedness assumption, we show a novel first order convergence rate result in both deterministic

12:   and stochastic contexts. Furthermore, we establish convergence rates of the function value sequence

13:   using the Kurdyka-\L{}ojasiewicz property.\\

14:   % which is satisfied for most deep neural networks.

15: \end{abstract}

16: