1d982f8fcf506b55.tex
1: \begin{abstract}
2:   Neural networks, a central tool in machine learning, have
3:   demonstrated remarkable, high fidelity performance on image
4:   recognition and classification tasks.  These successes evince an
5:   ability to accurately represent high dimensional functions, but
6: %   potentially of great use in computational and applied mathematics.
7: %   That said,
8: % Networks, however, require to be optimized or `trained' and
9:    rigorous results about the approximation
10:   error  of neural networks after training are few. Here we establish conditions for global convergence of the standard optimization algorithm used in
11:   machine learning applications, stochastic gradient descent (SGD), and quantify the scaling of its error with the size of the network. This is done by reinterpreting SGD as the
12:   evolution of a particle system with interactions governed by a
13:   potential related to the objective or ``loss'' function used to
14:   train the network. We show that, when the number $n$ of units
15:   is large, the empirical distribution of the particles descends on a
16:   convex landscape towards the global minimum at a rate independent of $n$, with a resulting approximation error that universally scales as
17:   $O(n^{-1})$. These properties are established in the form of a Law of Large Numbers and a Central Limit Theorem for
18:   the empirical distribution.
19: %   and, remarkably, these scaling results do not depend on the
20: %   dimensionality of the domain of the function that we seek to
21: %   represent.  
22: Our analysis also quantifies the scale and nature of the
23:   noise introduced by SGD and provides
24:   guidelines for the step size and batch size to use when training a
25:   neural network. We illustrate our findings on examples in which we
26:   train neural networks to learn the energy function of the continuous
27:   3-spin model on the sphere.  The approximation error scales as our
28:   analysis predicts in as high a dimension as $d=25$.
29: \end{abstract}
30: