abstract:f000e88de2863e6d.tex

1: \begin{abstract}

2:   \adam\ is a popular variant of stochastic gradient descent for

3:   finding a local minimizer of a function.

4:   In the constant stepsize regime, assuming that the objective function is differentiable and non-convex,

5:   we establish the convergence in the long run of the iterates to a stationary point under a stability condition.

6:   The key ingredient is the introduction of a continuous-time

7:   version of \adam, under the form of a non-autonomous ordinary

8:   differential equation.

9:   This continuous-time system is a relevant approximation of the \adam\

10:   iterates, in the sense that the interpolated \adam\ process

11:   converges weakly towards the solution to the ODE.

12:   The existence and the uniqueness of the

13:   solution are established. We further show the convergence of the solution

14:   towards the critical points of the objective function

15:   and quantify its convergence rate under a \L{}ojasiewicz assumption.

16:   Then, we introduce a novel decreasing stepsize version of \adam\,.

17:   Under mild assumptions, it is shown that the iterates are almost surely bounded

18:   %It is shown that the iterates are almost surely bounded

19:   and converge almost surely to critical points of the objective function.

20:   Finally, we analyze the fluctuations of the algorithm by means of a conditional central limit theorem.

21: \end{abstract}

22: