1: \begin{abstract}
2: We consider alternating gradient descent (AGD) with fixed step size
3: $\eta > 0$, applied to the asymmetric matrix factorization objective.
4: We show that, for a rank-$r$ matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$,
5: $T = \left( \left(\frac{\sigma_1(\mathbf{A})}{\sigma_r(\mathbf{A})}\right)^2 \log(1/\epsilon)\right)$
6: iterations of alternating gradient descent suffice to reach an $\epsilon$-optimal factorization
7: $\| \mathbf{A} - \mathbf{X}_T^{\vphantom{\intercal}} \mathbf{Y}_T^{\intercal} \|_{\rm F}^2 \leq \epsilon \| \mathbf{A} \|_{\rm F}^2$
8: with high probability
9: starting from an atypical random initialization. The
10: factors have rank $d>r$ so that $\mathbf{X}_T\in\mathbb{R}^{m \times d}$
11: and $\mathbf{Y}_T \in\mathbb{R}^{n \times d}$.
12: Experiments suggest that our proposed initialization is not merely
13: of theoretical benefit, but rather significantly improves
14: convergence of gradient descent in practice. Our proof is
15: conceptually simple: a uniform PL-inequality and uniform Lipschitz
16: smoothness constant are guaranteed for a sufficient number of
17: iterations, starting from our random initialization. Our proof
18: method should be useful for extending and simplifying convergence
19: analyses for a broader class of nonconvex low-rank factorization
20: problems.
21: \end{abstract}
22: