2505c88d7f71116f.tex
1: \begin{abstract}
2: In the mean field regime, neural networks are appropriately scaled
3: so that as the width tends to infinity, the learning dynamics tends
4: to a nonlinear and nontrivial dynamical limit, known as the mean field
5: limit. This lends a way to study large-width neural networks via analyzing
6: the mean field limit. Recent works have successfully applied such
7: analysis to two-layer networks and provided global convergence guarantees.
8: The extension to multilayer ones however has been a highly challenging
9: puzzle, and little is known about the optimization efficiency in the
10: mean field regime when there are more than two layers.
11: 
12: In this work, we prove a global convergence result for unregularized
13: feedforward three-layer networks in the mean field regime. We first
14: develop a rigorous framework to establish the mean field limit of
15: three-layer networks under stochastic gradient descent training. To
16: that end, we propose the idea of a \textit{neuronal embedding}, which
17: comprises of a fixed probability space that encapsulates neural networks
18: of arbitrary sizes. The identified mean field limit is then used to
19: prove a global convergence guarantee under suitable regularity and
20: convergence mode assumptions, which -- unlike previous works on two-layer
21: networks -- does not rely critically on convexity. Underlying the
22: result is a universal approximation property, natural of neural networks,
23: which importantly is shown to hold at \textit{any} finite training
24: time (not necessarily at convergence) via an algebraic topology argument.
25: \end{abstract}
26: