abstract:e43ff03e16cb34cc.tex

1: \begin{abstract}

2:   Neural networks with a large number of parameters admit a mean-field

3:   description, which has recently served as a theoretical explanation

4:   for the favorable training properties of ``overparameterized'' models.

5:   In this regime, gradient descent obeys a

6:   deterministic partial differential equation (PDE) that converges to

7:   a globally optimal solution for networks with a single hidden layer

8:   under appropriate assumptions.  In this work, we propose a non-local

9:   mass transport dynamics that leads to a modified PDE with the same

10:   minimizer.  We implement this non-local dynamics as a stochastic

11:   neuronal birth-death process and we prove that it accelerates the

12:   rate of convergence in the mean-field limit.  We subsequently

13:   realize this PDE with two classes of numerical schemes that converge

14:   to the mean-field equation, each of which can easily be implemented

15:   for neural networks with finite numbers of parameters.  We

16:   illustrate our algorithms with two models to provide intuition for

17:   the mechanism through which convergence is accelerated.

18: \end{abstract}

19: