1: \begin{abstract}
2: In this paper, we study the feature learning ability of two-layer neural networks in the mean-field regime
3: through the lens of kernel methods.
4: To focus on the dynamics of the kernel induced by the first layer, we utilize a two-timescale limit, where the second layer moves much faster than the first layer.
5: In this limit, the learning problem is reduced to the minimization problem over the intrinsic kernel.
6: Then, we show the global convergence of the mean-field Langevin dynamics and derive time and particle discretization error.
7: We also demonstrate that two-layer neural networks can learn a union of multiple reproducing kernel Hilbert spaces more efficiently than any kernel methods,
8: and neural networks aquire data-dependent kernel which aligns with the target function.
9: In addition, we develop a label noise procedure, which converges to the global optimum and show that the degrees of freedom appears as an implicit regularization.
10: % Finally, we verify our theoretical findings by numerical experiments.
11: \end{abstract}
12: