abstract:f6811c1ca29fa9e4.tex

1: \begin{abstract}

2: %   We consider alternating gradient descent (AGD) with fixed step size

3: %   $\eta > 0$, applied to the asymmetric matrix factorization objective.

4: %   We show that, for a rank-$r$ matrix $\A \in \Real[m,n]$,

5: %   $T = O\prn**( \prn*(\frac{\sigma_1(\A)}{\sigma_r(\A)})^2 \log(1/\epsilon))$

6: %   iterations of alternating gradient descent suffice to reach an $\epsilon$-optimal factorization

7: %   $\fnrm{ \A - \X{T}" \Y{T}' }^2 \leq \epsilon \fnrm{ \A}^2$   with high probability

8: %   starting from an atypical random initialization. The

9: %   factors have rank $d>r$ so that $\X{T}\in\Real[m,d]$ and $\Y{T} \in\Real[n,d]$.

10: %   Experiments suggest that our proposed initialization is not merely of theoretical benefit, but rather significantly improves convergence of gradient descent in practice. Our proof is conceptually simple: a uniform PL-inequality and uniform Lipschitz smoothness constant are guaranteed for a sufficient number of iterations, starting from our special initialization.  Our proof method should be useful for extending and simplifying convergence analyses for a broader class of nonconvex low-rank  factorization problems.

11: % \end{abstract}

12: