abstract:ce0f1ef060e9ecfb.tex

1: \begin{abstract}

2:   In this paper, we investigate the limiting behavior of a

3:   continuous-time counterpart of the Stochastic Gradient Descent (SGD)

4:   algorithm applied to two-layer overparameterized neural networks, as

5:   the number or neurons (\ie, the size of the hidden layer)

6:   $N \to \plusinfty$.  Following a probabilistic approach, we show

7:   `propagation of chaos' for the particle system defined by this

8:   continuous-time dynamics under different scenarios, indicating that

9:   the statistical interaction between the particles asymptotically

10:   vanishes. In particular, we establish quantitative convergence with

11:   respect to $N$ of any particle to a solution of a mean-field

12:   McKean-Vlasov equation in the metric space endowed with the

13:   Wasserstein distance. In comparison to previous works on the

14:   subject, we consider settings in which the sequence of stepsizes in

15:   SGD can potentially depend on the number of neurons and the

16:   iterations. We then identify two regimes under which different

17:   mean-field limits are obtained, one of them corresponding to an

18:   implicitly regularized version of the minimization problem at

19:   hand. We perform various experiments on real datasets to validate

20:   our theoretical results, assessing the existence of these two

21:   regimes on classification problems and illustrating our convergence

22:   results.

23: \end{abstract}

24: