1: \begin{abstract}
2: Neural networks with a large number of parameters admit a mean-field
3: description, which has recently served as a theoretical explanation
4: for the favorable training properties of ``overparameterized'' models.
5: In this regime, gradient descent obeys a
6: deterministic partial differential equation (PDE) that converges to
7: a globally optimal solution for networks with a single hidden layer
8: under appropriate assumptions. In this work, we propose a non-local
9: mass transport dynamics that leads to a modified PDE with the same
10: minimizer. We implement this non-local dynamics as a stochastic
11: neuronal birth-death process and we prove that it accelerates the
12: rate of convergence in the mean-field limit. We subsequently
13: realize this PDE with two classes of numerical schemes that converge
14: to the mean-field equation, each of which can easily be implemented
15: for neural networks with finite numbers of parameters. We
16: illustrate our algorithms with two models to provide intuition for
17: the mechanism through which convergence is accelerated.
18: \end{abstract}
19: