ce0f1ef060e9ecfb.tex
1: \begin{abstract}
2:   In this paper, we investigate the limiting behavior of a
3:   continuous-time counterpart of the Stochastic Gradient Descent (SGD)
4:   algorithm applied to two-layer overparameterized neural networks, as
5:   the number or neurons (\ie, the size of the hidden layer)
6:   $N \to \plusinfty$.  Following a probabilistic approach, we show
7:   `propagation of chaos' for the particle system defined by this
8:   continuous-time dynamics under different scenarios, indicating that
9:   the statistical interaction between the particles asymptotically
10:   vanishes. In particular, we establish quantitative convergence with
11:   respect to $N$ of any particle to a solution of a mean-field
12:   McKean-Vlasov equation in the metric space endowed with the
13:   Wasserstein distance. In comparison to previous works on the
14:   subject, we consider settings in which the sequence of stepsizes in
15:   SGD can potentially depend on the number of neurons and the
16:   iterations. We then identify two regimes under which different
17:   mean-field limits are obtained, one of them corresponding to an
18:   implicitly regularized version of the minimization problem at
19:   hand. We perform various experiments on real datasets to validate
20:   our theoretical results, assessing the existence of these two
21:   regimes on classification problems and illustrating our convergence
22:   results.
23: \end{abstract}
24: