1: \begin{abstract}
2: One of the most popular training algorithms for deep neural networks is the Adaptive Moment Estimation (Adam) introduced by Kingma and Ba.
3: Despite its success in many applications there is no satisfactory
4: convergence analysis: only local convergence can be shown for batch mode under some
5: restrictions on the hyperparameters, counterexamples
6: exist for incremental mode. Recent results show that for simple
7: quadratic objective functions limit cycles of period 2 exist in batch mode,
8: but only for atypical hyperparameters, and only for the algorithm without bias correction.
9: %More general there are several more adaptive gradient methods which try to estimate a fitting learning rate and / or search direction from the training data to improve the learning process compared to pure gradient descent with fixed learningrate.
10: We extend the convergence analysis for Adam in the batch mode with bias correction
11: and show that
12: even for quadratic objective functions as the simplest case of convex functions
13: 2-limit-cycles exist, for all choices of the hyperparameters.
14: We analyze the stability of these limit cycles and relate our analysis to other
15: results where approximate convergence was shown, but under the additional assumption of
16: bounded gradients which does not apply to quadratic functions.
17: The investigation heavily relies on the use of computer algebra
18: due to the complexity of the equations.
19: \keywords{Adam optimizer \and convergence \and computer algebra \and dynamical system \and limit cycle }
20: \end{abstract}
21: