abstract:ff5d918c9fb45f7b.tex

1: \begin{abstract}

2:

3: We analyze the generalization properties of two-layer neural networks in the

4: neural tangent kernel (NTK) regime, trained with gradient descent (GD).

5: For early stopped GD

6: we derive fast rates of convergence that are known to be minimax optimal

7: in the framework of non-parametric regression in reproducing kernel Hilbert spaces.

8: On our way, we precisely keep track of the number of hidden neurons

9: required for generalization and improve over existing results.

10: We further show that the weights during training

11: remain in a vicinity around initialization, the radius being dependent on structural assumptions such as degree of

12: smoothness of the regression function and eigenvalue decay of the integral operator associated to the NTK.

13: \end{abstract}