abstract:71c9b6ecdbae0ddf.tex

1: \begin{abstract}

2:     In this paper, we study the feature learning ability of two-layer neural networks in the mean-field regime

3:     through the lens of kernel methods.

4:     To focus on the dynamics of the kernel induced by the first layer, we utilize a two-timescale limit, where the second layer moves much faster than the first layer.

5:     In this limit, the learning problem is reduced to the minimization problem over the intrinsic kernel.

6:     Then, we show the global convergence of the mean-field Langevin dynamics and derive time and particle discretization error.

7:     We also demonstrate that two-layer neural networks can learn a union of multiple reproducing kernel Hilbert spaces more efficiently than any kernel methods,

8:     and neural networks aquire data-dependent kernel which aligns with the target function.

9:     In addition, we develop a label noise procedure, which converges to the global optimum and show that the degrees of freedom appears as an implicit regularization.

10:     % Finally, we verify our theoretical findings by numerical experiments.

11: \end{abstract}

12: