1: \begin{abstract}
2:
3: We analyze the generalization properties of two-layer neural networks in the
4: neural tangent kernel (NTK) regime, trained with gradient descent (GD).
5: For early stopped GD
6: we derive fast rates of convergence that are known to be minimax optimal
7: in the framework of non-parametric regression in reproducing kernel Hilbert spaces.
8: On our way, we precisely keep track of the number of hidden neurons
9: required for generalization and improve over existing results.
10: We further show that the weights during training
11: remain in a vicinity around initialization, the radius being dependent on structural assumptions such as degree of
12: smoothness of the regression function and eigenvalue decay of the integral operator associated to the NTK.
13: \end{abstract}