1: \begin{abstract}
2: In this note, we demonstrate provable convergence of SGD to the global minima of appropriately regularized $\ell_2-$empirical risk of depth $2$ nets -- for arbitrary data and with any number of gates, if they are using adequately smooth and bounded activations like sigmoid and tanh. We critically leverage having a constant amount of Frobenius norm regularization on the weights, along with a sampling of the initial weights from an appropriate class of distributions. We also prove an exponentially fast convergence rate for continuous time SGD that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence of loss functions on constant-sized neural nets which are ``Villani functions'' and thus be able to build on recent progress with analyzing SGD on such objectives.{\let\thefootnote\relax\footnote{{An extended abstract based on this work has been accepted at the Conference on the Mathematical Theory of Deep Neural Networks (DeepMath) 2022}}}
3: \end{abstract}
4: