1: \begin{abstract}
2: We continue a long line of research aimed at proving convergence of depth 2
3: neural networks, trained via gradient descent, to a global minimum. Like in
4: many previous works, our model has the following features: regression with
5: quadratic loss function, fully connected feedforward architecture, RelU
6: activations, Gaussian data instances and network initialization,
7: adversarial labels. It is more general in the sense that we allow both
8: layers to be trained simultaneously and at {\em different} rates.
9:
10: Our results improve on state-of-the-art \cite{OyS} (training the first
11: layer only) and \cite[Section 3.2]{Ngu} (training both layers with Le Cun's
12: initialization). We also report several simple experiments with synthetic
13: data. They strongly suggest that, at least in our model, the convergence
14: phenomenon extends well beyond the ``NTK regime''.
15: \end{abstract}