1: \begin{abstract}
2: % We consider alternating gradient descent (AGD) with fixed step size
3: % $\eta > 0$, applied to the asymmetric matrix factorization objective.
4: % We show that, for a rank-$r$ matrix $\A \in \Real[m,n]$,
5: % $T = O\prn**( \prn*(\frac{\sigma_1(\A)}{\sigma_r(\A)})^2 \log(1/\epsilon))$
6: % iterations of alternating gradient descent suffice to reach an $\epsilon$-optimal factorization
7: % $\fnrm{ \A - \X{T}" \Y{T}' }^2 \leq \epsilon \fnrm{ \A}^2$ with high probability
8: % starting from an atypical random initialization. The
9: % factors have rank $d>r$ so that $\X{T}\in\Real[m,d]$ and $\Y{T} \in\Real[n,d]$.
10: % Experiments suggest that our proposed initialization is not merely of theoretical benefit, but rather significantly improves convergence of gradient descent in practice. Our proof is conceptually simple: a uniform PL-inequality and uniform Lipschitz smoothness constant are guaranteed for a sufficient number of iterations, starting from our special initialization. Our proof method should be useful for extending and simplifying convergence analyses for a broader class of nonconvex low-rank factorization problems.
11: % \end{abstract}
12: