abstract:652049abdbe7fc95.tex

1: \begin{abstract}

2:   We consider alternating gradient descent (AGD) with fixed step size

3:   $\eta > 0$, applied to the asymmetric matrix factorization objective.

4:   We show that, for a rank-$r$ matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$,

5:   $T = \left( \left(\frac{\sigma_1(\mathbf{A})}{\sigma_r(\mathbf{A})}\right)^2 \log(1/\epsilon)\right)$

6:   iterations of alternating gradient descent suffice to reach an $\epsilon$-optimal factorization

7:   $\| \mathbf{A} - \mathbf{X}_T^{\vphantom{\intercal}} \mathbf{Y}_T^{\intercal} \|_{\rm F}^2 \leq \epsilon \| \mathbf{A} \|_{\rm F}^2$

8:   with high probability

9:   starting from an atypical random initialization. The

10:   factors have rank $d>r$ so that $\mathbf{X}_T\in\mathbb{R}^{m \times d}$

11:   and $\mathbf{Y}_T \in\mathbb{R}^{n \times d}$.

12:   Experiments suggest that our proposed initialization is not merely

13:   of theoretical benefit, but rather significantly improves

14:   convergence of gradient descent in practice. Our proof is

15:   conceptually simple: a uniform PL-inequality and uniform Lipschitz

16:   smoothness constant are guaranteed for a sufficient number of

17:   iterations, starting from our random initialization.  Our proof

18:   method should be useful for extending and simplifying convergence

19:   analyses for a broader class of nonconvex low-rank factorization

20:   problems.

21: \end{abstract}

22: