abstract:987ad2f145a3b877.tex

1: \begin{abstract}

2: Gradient normalization and soft clipping are two popular techniques for tackling instability issues and improving convergence of stochastic gradient descent (SGD) with momentum.

3: In this article, we study these types of methods through the lens of dissipative Hamiltonian systems. Gradient normalization and certain types of soft clipping algorithms can be seen as (stochastic) implicit-explicit Euler discretizations of dissipative Hamiltonian systems, where the kinetic energy function determines the type of clipping that is applied.

4: We make use of unified theory from dynamical systems to show that all of these schemes converge almost surely to stationary points of the objective function.

5: \end{abstract}

6: