abstract:74976d1baf1ba767.tex

1: \begin{abstract}

2: Randomized singular value decomposition (RSVD) is a class of

3: computationally efficient algorithms for computing the

4: truncated SVD of large data matrices.  Given an $n

5: \times n$ symmetric matrix $\M M$, the prototypical RSVD algorithm

6: outputs an approximation of the $k$ leading singular vectors of $\M M$

7: by computing the SVD of $\M M^{g} \M G$; here $g \geq 1$ is an integer

8: and $\M G \in \mathbb{R}^{n \times k}$ is a random Gaussian sketching

9: matrix. In this paper we study the statistical properties of RSVD under a

10: general ``signal-plus-noise'' framework, i.e., the observed matrix $\hat{\mathbf{M}}$ is

11: assumed to be an additive perturbation of some true but unknown signal

12: matrix $\mathbf{M}$.  We first derive upper bounds for the $\ell_2$

13: and $\ell_{2\to\infty}$ distances between the {\em approximate}

14: singular vectors of $\hat{\mathbf{M}}$ and the true

15: singular vectors of $\mathbf{M}$. These upper bounds depend

16: on the signal-to-noise ratio (SNR) and the number of power iterations

17: $g$. A phase transition phenomenon is observed in which a smaller SNR requires larger

18: values of $g$ to guarantee convergence of the $\ell_2$ and $\ell_{2\to

19:   \infty}$ distances. We also show that the thresholds for $g$ where

20: these phase transitions occur are sharp

21: whenever the noise matrices satisfy a certain trace growth

22: condition. Finally, we derive normal approximations for the

23: row-wise fluctuations of these approximate singular vectors and the

24: entrywise fluctuations when $\hat{\M M}$ is projected onto these

25: vectors.

26: We illustrate our theoretical results by deriving nearly-optimal performance

27: guarantees for RSVD when applied to three statistical inference

28: problems, namely, community detection, matrix completion, and

29: PCA with missing data.

30: \end{abstract}

31: