67993a9a6e836be1.tex
1: \begin{abstract}
2:    We consider a deep matrix factorization model of covariance matrices trained with the Bures-Wasserstein distance. 
3:    While recent works have made advances in the study of the optimization problem for overparametrized low-rank matrix approximation, much emphasis has been placed on discriminative settings and the square loss. In contrast, our model considers another type of loss and connects with the generative setting. 
4:    We characterize the critical points and minimizers of the Bures-Wasserstein
5:    distance over the space of rank-bounded matrices. The Hessian of this loss at
6:    low-rank matrices can theoretically blow up, which creates challenges to
7:    analyze convergence of gradient optimization methods. 
8:    We establish convergence results for gradient flow using a smooth perturbative version of the loss as well as convergence results for finite step size gradient descent under certain assumptions on the initial weights. 
9:    \end{abstract}
10: