1: \begin{abstract}
2: Linear networks provide valuable insights into the workings of neural networks in general.
3: This paper identifies conditions under which the gradient flow provably trains a linear network, in spite of the non-strict saddle points present in the optimization landscape.
4: This paper also provides the computational complexity of training linear networks with gradient flow.
5: To achieve these results, this work develops a machinery to provably identify the stable set of gradient flow, which then enables us to
6: improve over the state of the art in the literature of linear networks~\cite{bah2019learning,arora2018convergence}.
7: Crucially, our results appear to be the first to break away from the {lazy training} regime which has dominated the literature of neural networks.
8: This work requires the network to have \emph{a} layer with one neuron, which subsumes the networks with a scalar output, but extending the results of this theoretical work to all linear networks remains a challenging open problem. \end{abstract}
9: