cd133089257d715d.tex
1: \begin{abstract}
2: The study of Neural Tangent Kernels (NTKs) has provided much needed insight 
3: into convergence and generalization properties of neural networks in the 
4: over-parametrized (wide) limit by approximating the network using a 
5: first-order Taylor expansion with respect to its weights in the neighborhood 
6: of their initialization values.  This allows neural network training to be 
7: analyzed from the perspective of reproducing kernel Hilbert spaces (RKHS), 
8: which is informative in the over-parametrized regime, but a poor approximation 
9: for narrower networks as the weights change more during training.  Our goal is 
10: to extend beyond the limits of NTK toward a more general theory.  We construct 
11: an exact power-series representation of the neural network in a finite 
12: neighborhood of the initial weights as an inner product of two feature maps, 
13: respectively from data and weight-step space, to feature space, allowing 
14: neural network training to be analyzed from the perspective of reproducing 
15: kernel {\em Banach} space (RKBS). We prove that, regardless of width, the 
16: training sequence produced by gradient descent can be exactly replicated by 
17: regularized sequential learning in RKBS.  Using this, we present novel bound 
18: on uniform convergence where the iterations count and learning rate play a 
19: central role, giving new theoretical insight into neural network training.
20: \end{abstract}
21: