f000e88de2863e6d.tex
1: \begin{abstract}
2:   \adam\ is a popular variant of stochastic gradient descent for
3:   finding a local minimizer of a function.
4:   In the constant stepsize regime, assuming that the objective function is differentiable and non-convex,
5:   we establish the convergence in the long run of the iterates to a stationary point under a stability condition.
6:   The key ingredient is the introduction of a continuous-time
7:   version of \adam, under the form of a non-autonomous ordinary
8:   differential equation.
9:   This continuous-time system is a relevant approximation of the \adam\
10:   iterates, in the sense that the interpolated \adam\ process
11:   converges weakly towards the solution to the ODE.
12:   The existence and the uniqueness of the
13:   solution are established. We further show the convergence of the solution
14:   towards the critical points of the objective function
15:   and quantify its convergence rate under a \L{}ojasiewicz assumption.
16:   Then, we introduce a novel decreasing stepsize version of \adam\,.
17:   Under mild assumptions, it is shown that the iterates are almost surely bounded
18:   %It is shown that the iterates are almost surely bounded
19:   and converge almost surely to critical points of the objective function.
20:   Finally, we analyze the fluctuations of the algorithm by means of a conditional central limit theorem.
21: \end{abstract}
22: