1: \begin{abstract}%
2: Training deep neural networks (DNNs) can be difficult due to the occurrence of vanishing/exploding gradients during weight optimization.
3: To avoid this problem, we propose a class of DNNs stemming from the time discretization of Hamiltonian systems.
4: The time-invariant version of the corresponding Hamiltonian models enjoys marginal stability, a property that, as shown in previous works and for specific DNNs architectures, can mitigate convergence to zero or divergence of gradients.
5: In the present paper, we formally study this feature by deriving and analysing the backward gradient dynamics in continuous time.
6: The proposed Hamiltonian framework, besides encompassing existing networks inspired by marginally stable ODEs, allows one to derive new and more expressive architectures.
7: The good performance of the novel DNNs is demonstrated on benchmark classification problems, including digit recognition using the MNIST dataset.
8: \\
9: \textbf{Keywords: } Deep Neural Networks, Dynamical Systems, Hamiltonian Systems, Gradient Dynamics.%
10: \end{abstract}
11: