1: \begin{abstract}
2: The study of Neural Tangent Kernels (NTKs) has provided much needed insight
3: into convergence and generalization properties of neural networks in the
4: over-parametrized (wide) limit by approximating the network using a
5: first-order Taylor expansion with respect to its weights in the neighborhood
6: of their initialization values. This allows neural network training to be
7: analyzed from the perspective of reproducing kernel Hilbert spaces (RKHS),
8: which is informative in the over-parametrized regime, but a poor approximation
9: for narrower networks as the weights change more during training. Our goal is
10: to extend beyond the limits of NTK toward a more general theory. We construct
11: an exact power-series representation of the neural network in a finite
12: neighborhood of the initial weights as an inner product of two feature maps,
13: respectively from data and weight-step space, to feature space, allowing
14: neural network training to be analyzed from the perspective of reproducing
15: kernel {\em Banach} space (RKBS). We prove that, regardless of width, the
16: training sequence produced by gradient descent can be exactly replicated by
17: regularized sequential learning in RKBS. Using this, we present novel bound
18: on uniform convergence where the iterations count and learning rate play a
19: central role, giving new theoretical insight into neural network training.
20: \end{abstract}
21: