ba86c94ad6c9a687.tex
1: \begin{abstract}
2: Linear networks provide  valuable insights into the workings of neural networks in general. 
3: This paper identifies conditions under which the gradient flow provably trains a linear network, in spite of the non-strict saddle points present in the optimization landscape. 
4: This paper also provides the computational complexity of training linear networks with gradient flow.
5: To achieve these results, this work develops a  machinery to provably identify the stable set of gradient flow, which then enables us to
6: improve over the state of the art in the literature of linear networks~\cite{bah2019learning,arora2018convergence}.
7: Crucially, our results appear to be the first to break away from the {lazy training} regime which has dominated the literature of neural networks. 
8: This work requires the network to have \emph{a} layer with one neuron, which subsumes the  networks  with a scalar output, but extending the results of this theoretical work to all linear networks remains a challenging open problem.  \end{abstract}
9: