1: \begin{abstract}
2: We study the convergence of gradient flows related to learning deep linear
3: neural networks (where the activation function is the identity map) from data.
4: In this case, the composition of the network layers amounts to simply
5: multiplying the weight matrices of all layers together, resulting in an
6: overparameterized problem. The gradient flow with respect to these
7: factors can be re-interpreted as a Riemannian gradient flow on the manifold of
8: rank-$r$ matrices endowed with a suitable Riemannian metric. We show that the
9: flow always converges to a critical point of the underlying functional.
10: Moreover, we establish that, for almost all initializations, the flow converges
11: to a global minimum on the manifold of rank $k$ matrices for some $k\leq r$.
12: \end{abstract}
13: