1: \begin{abstract}
2: \adam\ is a popular variant of stochastic gradient descent for
3: finding a local minimizer of a function.
4: In the constant stepsize regime, assuming that the objective function is differentiable and non-convex,
5: we establish the convergence in the long run of the iterates to a stationary point under a stability condition.
6: The key ingredient is the introduction of a continuous-time
7: version of \adam, under the form of a non-autonomous ordinary
8: differential equation.
9: This continuous-time system is a relevant approximation of the \adam\
10: iterates, in the sense that the interpolated \adam\ process
11: converges weakly towards the solution to the ODE.
12: The existence and the uniqueness of the
13: solution are established. We further show the convergence of the solution
14: towards the critical points of the objective function
15: and quantify its convergence rate under a \L{}ojasiewicz assumption.
16: Then, we introduce a novel decreasing stepsize version of \adam\,.
17: Under mild assumptions, it is shown that the iterates are almost surely bounded
18: %It is shown that the iterates are almost surely bounded
19: and converge almost surely to critical points of the objective function.
20: Finally, we analyze the fluctuations of the algorithm by means of a conditional central limit theorem.
21: \end{abstract}
22: