338332ad32869e5b.tex
1: \begin{abstract} 
2: 	Neural networks exhibit good generalization behavior in the
3: 	\textit{over-parameterized} regime, where the number of network parameters
4: 	exceeds the number of observations. Nonetheless,
5: 	current generalization bounds for neural networks fail to explain this
6: 	phenomenon. In an attempt to bridge this gap, we study the problem of
7: 	learning a two-layer over-parameterized neural network, when the data is generated by a linearly separable function. In the case where the network has Leaky
8: 	ReLU activations, we provide both optimization and generalization guarantees for over-parameterized networks.
9: 	Specifically, we prove convergence rates of SGD to a global
10: 	minimum and provide generalization guarantees for this global minimum
11: 	that are independent of the network size. 
12: 	%This is a surprising result since, as we show, there exist multiple global minima that overfit the training set. 
13: 	Therefore, our result clearly shows that the use of SGD for optimization both finds a global minimum, and avoids overfitting despite the high capacity of the model. This is the first theoretical demonstration that SGD can avoid overfitting, when learning over-specified neural network classifiers.
14: \end{abstract}
15: