e43ff03e16cb34cc.tex
1: \begin{abstract}
2:   Neural networks with a large number of parameters admit a mean-field
3:   description, which has recently served as a theoretical explanation
4:   for the favorable training properties of ``overparameterized'' models.  
5:   In this regime, gradient descent obeys a
6:   deterministic partial differential equation (PDE) that converges to
7:   a globally optimal solution for networks with a single hidden layer
8:   under appropriate assumptions.  In this work, we propose a non-local
9:   mass transport dynamics that leads to a modified PDE with the same
10:   minimizer.  We implement this non-local dynamics as a stochastic
11:   neuronal birth-death process and we prove that it accelerates the
12:   rate of convergence in the mean-field limit.  We subsequently
13:   realize this PDE with two classes of numerical schemes that converge
14:   to the mean-field equation, each of which can easily be implemented
15:   for neural networks with finite numbers of parameters.  We
16:   illustrate our algorithms with two models to provide intuition for
17:   the mechanism through which convergence is accelerated.
18: \end{abstract}
19: