abstract:23a70e47cb3f9f04.tex

1: \begin{abstract}

2: We continue a long line of research aimed at proving convergence of depth 2

3: neural networks, trained via gradient descent, to a global minimum. Like in

4: many previous works, our model has the following features: regression with

5: quadratic loss function, fully connected feedforward architecture, RelU

6: activations, Gaussian data instances and network initialization,

7: adversarial labels. It is more general in the sense that we allow both

8: layers to be trained simultaneously and at {\em different} rates.

9:

10: Our results improve on state-of-the-art \cite{OyS} (training the first

11: layer only) and \cite[Section 3.2]{Ngu} (training both layers with Le Cun's

12: initialization). We also report several simple experiments with synthetic

13: data. They strongly suggest that, at least in our model, the convergence

14: phenomenon extends well beyond the ``NTK regime''.

15: \end{abstract}