1: \begin{abstract}
2: Neural networks exhibit good generalization behavior in the
3: \textit{over-parameterized} regime, where the number of network parameters
4: exceeds the number of observations. Nonetheless,
5: current generalization bounds for neural networks fail to explain this
6: phenomenon. In an attempt to bridge this gap, we study the problem of
7: learning a two-layer over-parameterized neural network, when the data is generated by a linearly separable function. In the case where the network has Leaky
8: ReLU activations, we provide both optimization and generalization guarantees for over-parameterized networks.
9: Specifically, we prove convergence rates of SGD to a global
10: minimum and provide generalization guarantees for this global minimum
11: that are independent of the network size.
12: %This is a surprising result since, as we show, there exist multiple global minima that overfit the training set.
13: Therefore, our result clearly shows that the use of SGD for optimization both finds a global minimum, and avoids overfitting despite the high capacity of the model. This is the first theoretical demonstration that SGD can avoid overfitting, when learning over-specified neural network classifiers.
14: \end{abstract}
15: