abstract:d83ff2f63478893e.tex

1: \begin{abstract}

2: We study the convergence of gradient flows related to learning deep linear

3: neural networks (where the activation function is the identity map) from data.

4: In this case, the composition of the network layers amounts to simply

5: multiplying the weight matrices of all layers together, resulting in an

6: overparameterized problem. The gradient flow with respect to these

7: factors can be re-interpreted as a Riemannian gradient flow on the manifold of

8: rank-$r$ matrices endowed with a suitable Riemannian metric. We show that the

9: flow always converges to a critical point of the underlying functional.

10: Moreover, we establish that, for almost all initializations, the flow converges

11: to a global minimum on the manifold of rank $k$ matrices for some $k\leq r$.

12: \end{abstract}

13: