1: \begin{abstract}
2: Training neural ODEs on large datasets has not been tractable due to the
3: necessity of allowing the adaptive numerical ODE solver to refine its step size
4: to very small values. In practice this leads to dynamics equivalent to many
5: hundreds or even thousands of layers. In this paper, we overcome this apparent
6: difficulty by introducing a theoretically-grounded combination of both optimal
7: transport and stability regularizations which encourage neural ODEs to prefer
8: simpler dynamics out of all the dynamics that solve a problem well. Simpler
9: dynamics lead to faster convergence and to fewer discretizations of the solver,
10: considerably decreasing wall-clock time without loss in performance. Our
11: approach allows us to train neural ODE-based generative models to the same
12: performance as the unregularized dynamics, with significant reductions in
13: training time. This brings neural ODEs closer to practical relevance in
14: large-scale applications.
15: \end{abstract}
16: