ff5d918c9fb45f7b.tex
1: \begin{abstract}
2: 
3: We analyze the generalization properties of two-layer neural networks in the 
4: neural tangent kernel (NTK) regime, trained with gradient descent (GD). 
5: For early stopped GD  
6: we derive fast rates of convergence that are known to be minimax optimal 
7: in the framework of non-parametric regression in reproducing kernel Hilbert spaces.  
8: On our way, we precisely keep track of the number of hidden neurons 
9: required for generalization and improve over existing results. 
10: We further show that the weights during training 
11: remain in a vicinity around initialization, the radius being dependent on structural assumptions such as degree of 
12: smoothness of the regression function and eigenvalue decay of the integral operator associated to the NTK.       
13: \end{abstract}