28918c49094b29e7.tex
1: \begin{abstract}
2: 
3: 
4: Many latent-variable applications, including community detection, collaborative filtering, genomic analysis, and NLP, model data as generated by low-rank matrices. Yet despite considerable research, except for very special cases, the number of samples required to efficiently recover the underlying matrices has not been known. 
5: 
6: We determine the onset of learning in several common latent-variable settings. For all of them, we show that learning $k\hskip-.09em\times\hskip-.09em k$, rank-$r$, matrices to 
7: %total-variation ? or
8: normalized $L_{\hskip-.02em1}$
9: distance~$\epsilon$ requires  $\Omega(\frac{kr}{\epsilon^2})$ samples, and propose an algorithm that uses ${\cal O}(\frac{kr}{\epsilon^2}\log^2\frac r\epsilon)$ samples, a number linear in the high dimension, and
10: %essentially
11: nearly
12: linear in the, typically low,~rank.
13: 
14: The algorithm improves on existing
15: spectral techniques
16: and
17: runs in polynomial time.
18: %, and uses
19: The proofs establish 
20: new results on the rapid convergence of the spectral distance between the model and observation matrices, and may be of independent interest.\looseness-1 
21: 
22: 
23: \end{abstract}
24: