1: \begin{abstract}
2: The dynamic behavior of RMSprop and Adam algorithms is studied through a combination of careful numerical experiments and theoretical explanations.
3: Three types of qualitative features are observed in the training loss curve: fast initial convergence, oscillations and large spikes.
4: The sign gradient descent (signGD) algorithm, which is the limit of
5: Adam when taking the learning rate to $0$ while keeping the momentum parameters fixed,
6: is used to explain the fast initial convergence.
7: For the late phase of Adam, three different types of qualitative patterns are observed depending on the choice of the hyper-parameters:
8: oscillations, spikes and divergence.
9: In particular, Adam converges faster and smoother when the values of the two momentum factors are close to each other.
10: \end{abstract}
11: