f6811c1ca29fa9e4.tex
1: \begin{abstract}
2: %   We consider alternating gradient descent (AGD) with fixed step size
3: %   $\eta > 0$, applied to the asymmetric matrix factorization objective.
4: %   We show that, for a rank-$r$ matrix $\A \in \Real[m,n]$,
5: %   $T = O\prn**( \prn*(\frac{\sigma_1(\A)}{\sigma_r(\A)})^2 \log(1/\epsilon))$
6: %   iterations of alternating gradient descent suffice to reach an $\epsilon$-optimal factorization 
7: %   $\fnrm{ \A - \X{T}" \Y{T}' }^2 \leq \epsilon \fnrm{ \A}^2$   with high probability
8: %   starting from an atypical random initialization. The
9: %   factors have rank $d>r$ so that $\X{T}\in\Real[m,d]$ and $\Y{T} \in\Real[n,d]$.
10: %   Experiments suggest that our proposed initialization is not merely of theoretical benefit, but rather significantly improves convergence of gradient descent in practice. Our proof is conceptually simple: a uniform PL-inequality and uniform Lipschitz smoothness constant are guaranteed for a sufficient number of iterations, starting from our special initialization.  Our proof method should be useful for extending and simplifying convergence analyses for a broader class of nonconvex low-rank  factorization problems.
11: % \end{abstract}
12: