abstract:871887d4d526f459.tex

1: \begin{abstract}

2: One of the most popular training algorithms for deep neural networks is the Adaptive Moment Estimation (Adam) introduced by Kingma and Ba.

3: Despite its success in many applications there is no satisfactory

4: convergence analysis: only local convergence can be shown for batch mode under some

5: restrictions on the hyperparameters, counterexamples

6: exist for incremental mode. Recent results show that for simple

7: quadratic objective functions limit cycles of period 2 exist in batch mode,

8: but only for atypical hyperparameters, and only for the algorithm without bias correction.

9: %More general there are several more adaptive gradient methods which try to estimate a fitting learning rate and / or search direction from the training data to improve the learning process compared to pure gradient descent with fixed learningrate.

10: We extend the convergence analysis for Adam in the batch mode with bias correction

11: and show that

12: even for quadratic objective functions as the simplest case of convex functions

13: 2-limit-cycles exist, for all choices of the hyperparameters.

14: We analyze the stability of these limit cycles and relate our analysis to other

15: results where approximate convergence was shown, but under the additional assumption of

16: bounded gradients which does not apply to quadratic functions.

17: The investigation heavily relies on the use of computer algebra

18: due to the complexity of the equations.

19: \keywords{Adam optimizer \and convergence \and computer algebra \and dynamical system \and limit cycle }

20: \end{abstract}

21: