245d5c58ef385910.tex
1: \begin{abstract}
2: 	We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. 
3: 	%We prove that if the initial weight matrices are approximately balanced and the weight norm of each matrix is bounded by a constant that depends on the initial loss value and the network depth ($N$) then the weight matrices will stay bounded and balanced throughout all iteration. Moreover,
4: 	We show that under suitable conditions on the step sizes gradient descent  converges to a critical point of the loss function, i.e., the square loss in this article. Furthermore, we demonstrate that for almost all initializations gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori.
5: \end{abstract}
6: