1: \begin{abstract}
2: In this work we demonstrate provable guarantees on the training of depth-$2$ neural networks in new regimes than previously explored. (1) First we give a simple stochastic algorithm that can train a $\relu$ gate in the realizable setting in {\it linear time} while using significantly milder conditions on the data distribution than previous results. Leveraging some additional distributional assumptions we also show approximate recovery of the true label generating parameters when training a $\relu$ gate while a probabilistic adversary is allowed to corrupt the true labels of the training data. Our guarantee on recovering the true weight degrades gracefully with increasing probability of attack and its nearly optimal in the worst case. Additionally our analysis allows for mini-batching and computes how the convergence time scales with the mini-batch size. (2) Secondly, we exhibit a non-gradient iterative algorithm ``{\rm Neuro-Tron}" which gives a first-of-its-kind poly-time approximate solving of a neural regression (here in the $\ell_\infty$-norm) problem at finite net widths and for non-realizable data.
3:
4: %(3) Lastly we analyze the behaviour of noise assisted gradient descent on a $\relu$ gate in the realizable setting. While making no further distributional assumptions, we locate a ball centered at the origin such that all the iterates remain inside it with high probability.
5:
6:
7: %\keywords{neural nets, non-gradient iterative algorithms, stochastic algorithms, non-smooth non-convex optimization}
8:
9: %\PACS{PACS code1 \and PACS code2 \and more}
10: %\subclass{MSC code1 \and MSC code2 \and more}
11:
12: \end{abstract}
13: