74976d1baf1ba767.tex
1: \begin{abstract}
2: Randomized singular value decomposition (RSVD) is a class of
3: computationally efficient algorithms for computing the
4: truncated SVD of large data matrices.  Given an $n
5: \times n$ symmetric matrix $\M M$, the prototypical RSVD algorithm
6: outputs an approximation of the $k$ leading singular vectors of $\M M$
7: by computing the SVD of $\M M^{g} \M G$; here $g \geq 1$ is an integer
8: and $\M G \in \mathbb{R}^{n \times k}$ is a random Gaussian sketching
9: matrix. In this paper we study the statistical properties of RSVD under a
10: general ``signal-plus-noise'' framework, i.e., the observed matrix $\hat{\mathbf{M}}$ is
11: assumed to be an additive perturbation of some true but unknown signal
12: matrix $\mathbf{M}$.  We first derive upper bounds for the $\ell_2$
13: and $\ell_{2\to\infty}$ distances between the {\em approximate}
14: singular vectors of $\hat{\mathbf{M}}$ and the true
15: singular vectors of $\mathbf{M}$. These upper bounds depend
16: on the signal-to-noise ratio (SNR) and the number of power iterations
17: $g$. A phase transition phenomenon is observed in which a smaller SNR requires larger
18: values of $g$ to guarantee convergence of the $\ell_2$ and $\ell_{2\to
19:   \infty}$ distances. We also show that the thresholds for $g$ where
20: these phase transitions occur are sharp
21: whenever the noise matrices satisfy a certain trace growth
22: condition. Finally, we derive normal approximations for the
23: row-wise fluctuations of these approximate singular vectors and the
24: entrywise fluctuations when $\hat{\M M}$ is projected onto these
25: vectors.  
26: We illustrate our theoretical results by deriving nearly-optimal performance
27: guarantees for RSVD when applied to three statistical inference
28: problems, namely, community detection, matrix completion, and
29: PCA with missing data. 
30: \end{abstract}
31: