71c9b6ecdbae0ddf.tex
1: \begin{abstract}
2:     In this paper, we study the feature learning ability of two-layer neural networks in the mean-field regime
3:     through the lens of kernel methods.
4:     To focus on the dynamics of the kernel induced by the first layer, we utilize a two-timescale limit, where the second layer moves much faster than the first layer.
5:     In this limit, the learning problem is reduced to the minimization problem over the intrinsic kernel.
6:     Then, we show the global convergence of the mean-field Langevin dynamics and derive time and particle discretization error.
7:     We also demonstrate that two-layer neural networks can learn a union of multiple reproducing kernel Hilbert spaces more efficiently than any kernel methods,
8:     and neural networks aquire data-dependent kernel which aligns with the target function.
9:     In addition, we develop a label noise procedure, which converges to the global optimum and show that the degrees of freedom appears as an implicit regularization.
10:     % Finally, we verify our theoretical findings by numerical experiments.
11: \end{abstract}
12: