abstract:28918c49094b29e7.tex

1: \begin{abstract}

2:

3:

4: Many latent-variable applications, including community detection, collaborative filtering, genomic analysis, and NLP, model data as generated by low-rank matrices. Yet despite considerable research, except for very special cases, the number of samples required to efficiently recover the underlying matrices has not been known.

5:

6: We determine the onset of learning in several common latent-variable settings. For all of them, we show that learning $k\hskip-.09em\times\hskip-.09em k$, rank-$r$, matrices to

7: %total-variation ? or

8: normalized $L_{\hskip-.02em1}$

9: distance~$\epsilon$ requires  $\Omega(\frac{kr}{\epsilon^2})$ samples, and propose an algorithm that uses ${\cal O}(\frac{kr}{\epsilon^2}\log^2\frac r\epsilon)$ samples, a number linear in the high dimension, and

10: %essentially

11: nearly

12: linear in the, typically low,~rank.

13:

14: The algorithm improves on existing

15: spectral techniques

16: and

17: runs in polynomial time.

18: %, and uses

19: The proofs establish

20: new results on the rapid convergence of the spectral distance between the model and observation matrices, and may be of independent interest.\looseness-1

21:

22:

23: \end{abstract}

24: