abstract:cd133089257d715d.tex

1: \begin{abstract}

2: The study of Neural Tangent Kernels (NTKs) has provided much needed insight

3: into convergence and generalization properties of neural networks in the

4: over-parametrized (wide) limit by approximating the network using a

5: first-order Taylor expansion with respect to its weights in the neighborhood

6: of their initialization values.  This allows neural network training to be

7: analyzed from the perspective of reproducing kernel Hilbert spaces (RKHS),

8: which is informative in the over-parametrized regime, but a poor approximation

9: for narrower networks as the weights change more during training.  Our goal is

10: to extend beyond the limits of NTK toward a more general theory.  We construct

11: an exact power-series representation of the neural network in a finite

12: neighborhood of the initial weights as an inner product of two feature maps,

13: respectively from data and weight-step space, to feature space, allowing

14: neural network training to be analyzed from the perspective of reproducing

15: kernel {\em Banach} space (RKBS). We prove that, regardless of width, the

16: training sequence produced by gradient descent can be exactly replicated by

17: regularized sequential learning in RKBS.  Using this, we present novel bound

18: on uniform convergence where the iterations count and learning rate play a

19: central role, giving new theoretical insight into neural network training.

20: \end{abstract}

21: