abstract:338332ad32869e5b.tex

1: \begin{abstract}

2: 	Neural networks exhibit good generalization behavior in the

3: 	\textit{over-parameterized} regime, where the number of network parameters

4: 	exceeds the number of observations. Nonetheless,

5: 	current generalization bounds for neural networks fail to explain this

6: 	phenomenon. In an attempt to bridge this gap, we study the problem of

7: 	learning a two-layer over-parameterized neural network, when the data is generated by a linearly separable function. In the case where the network has Leaky

8: 	ReLU activations, we provide both optimization and generalization guarantees for over-parameterized networks.

9: 	Specifically, we prove convergence rates of SGD to a global

10: 	minimum and provide generalization guarantees for this global minimum

11: 	that are independent of the network size.

12: 	%This is a surprising result since, as we show, there exist multiple global minima that overfit the training set.

13: 	Therefore, our result clearly shows that the use of SGD for optimization both finds a global minimum, and avoids overfitting despite the high capacity of the model. This is the first theoretical demonstration that SGD can avoid overfitting, when learning over-specified neural network classifiers.

14: \end{abstract}

15: