1: \begin{abstract}
2: We consider a deep matrix factorization model of covariance matrices trained with the Bures-Wasserstein distance.
3: While recent works have made advances in the study of the optimization problem for overparametrized low-rank matrix approximation, much emphasis has been placed on discriminative settings and the square loss. In contrast, our model considers another type of loss and connects with the generative setting.
4: We characterize the critical points and minimizers of the Bures-Wasserstein
5: distance over the space of rank-bounded matrices. The Hessian of this loss at
6: low-rank matrices can theoretically blow up, which creates challenges to
7: analyze convergence of gradient optimization methods.
8: We establish convergence results for gradient flow using a smooth perturbative version of the loss as well as convergence results for finite step size gradient descent under certain assumptions on the initial weights.
9: \end{abstract}
10: