0809.2274/pca.tex
1: \documentclass[final]{siamltex}
2: \usepackage{amsmath}
3: \usepackage{graphicx}
4: 
5: \def\T{{\hbox{\scriptsize{\rm T}}}}
6: \def\tinyT{{\hbox{\tiny{\rm T}}}}
7: \def\epsilon{\varepsilon}
8: \def\phi{\varphi}
9: \def\bigoh{\mathcal{O}}
10: \def\th{{\rm th}}
11: \def\ith{{\it th}}
12: \def\rd{{\rm rd}}
13: \def\ird{{\it rd}}
14: \def\nd{{\rm nd}}
15: \def\ind{{\it nd}}
16: \def\st{{\rm st}}
17: \def\ist{{\it st}}
18: \def\Id{{\bf 1}}
19: \def\0s{{\bf 0}}
20: 
21: \def\registered{$^{\hbox{\ooalign{\hfil\raise .20ex\hbox{\textbf{\tiny R}}\hfil\crcr\mathhexbox20C}}}$}
22: 
23: \newtheorem{observe}[theorem]{Observation}
24: \newtheorem{remark1}[theorem]{Remark}
25: 
26: \newenvironment{observation}{\begin{observe} \rm}{\end{observe}}
27: \newenvironment{remark}{\begin{remark1} \rm}{\end{remark1}}
28: 
29: 
30: \title{A randomized algorithm for\\principal component analysis}
31: \author{Vladimir Rokhlin\thanks{Departments of Computer Science, Mathematics,
32: and Physics, Yale University, New Haven, CT 06511;
33: supported in part by DARPA/AFOSR Grant FA9550-07-1-0541.} \and
34: Arthur Szlam\thanks{Department of Mathematics, UCLA, Los Angeles, CA 90095-1555;
35: supported in part by NSF Grant DMS-0811203 ({\tt aszlam@math.ucla.edu}).} \and
36: Mark Tygert\thanks{Department of Mathematics, UCLA, Los Angeles, CA 90095-1555
37: ({\tt tygert@aya.yale.edu}).}
38: }
39: 
40: 
41: \begin{document}
42: 
43: \maketitle
44: 
45: \begin{abstract}
46: Principal component analysis (PCA) requires the computation
47: of a low-rank approximation to a matrix containing the data being analyzed.
48: In many applications of PCA, the best possible accuracy
49: of any rank-deficient approximation is at most a few digits
50: (measured in the spectral norm,
51: relative to the spectral norm of the matrix being approximated).
52: In such circumstances, efficient algorithms have not come
53: with guarantees of good accuracy,
54: unless one or both dimensions of the matrix being approximated are small.
55: We describe an efficient algorithm for the low-rank approximation of matrices
56: that produces accuracy very close to the best possible,
57: for matrices of arbitrary sizes.
58: We illustrate our theoretical results via several numerical examples.
59: \end{abstract}
60: 
61: \begin{keywords}
62: PCA, singular value decomposition, SVD, low rank, Lanczos, power
63: \end{keywords}
64: 
65: \begin{AMS}
66: 65F15, 65C60, 68W20
67: \end{AMS}
68: 
69: 
70: \pagestyle{myheadings}
71: \thispagestyle{plain}
72: \markboth{ROKHLIN, SZLAM, AND TYGERT}{A RANDOMIZED ALGORITHM FOR PCA}
73: 
74: 
75: 
76: \section{Introduction}
77: 
78: Principal component analysis\,(PCA)\,is among the most\,widely used techniques
79: in statistics, data analysis, and data mining.
80: PCA is the basis of many machine learning methods,
81: including the latent semantic analysis
82: of large databases of text and HTML documents described
83: in~\cite{deerwester-dumais-furnas-landauer-harshman}.
84: Computationally, PCA amounts to the low-rank approximation of a matrix
85: containing the data being analyzed.
86: The present article describes an algorithm
87: for the low-rank approximation of matrices, suitable for PCA.
88: This paper demonstrates both theoretically and via numerical examples
89: that the algorithm efficiently produces low-rank approximations
90: whose accuracies are very close to the best possible.
91: 
92: The canonical construction of the best possible rank-$k$ approximation
93: to a real $m \times n$ matrix $A$ uses the singular value decomposition (SVD)
94: of $A$,
95: %
96: \begin{equation}
97: \label{full_svd}
98: A = U \, \Sigma \, V^\T,
99: \end{equation}
100: %
101: where $U$ is a real unitary $m \times m$ matrix,
102: $V$ is a real unitary $n \times n$ matrix,
103: and $\Sigma$ is a real $m \times n$ matrix whose only nonzero entries
104: appear in nonincreasing order on the diagonal and are nonnegative.
105: The diagonal entries $\sigma_1$,~$\sigma_2$,
106: \dots, $\sigma_{\min(m,n)-1}$,~$\sigma_{\min(m,n)}$
107: of $\Sigma$ are known as the singular values of $A$.
108: The best rank-$k$ approximation to $A$, with $k < m$ and $k < n$, is
109: %
110: \begin{equation}
111: \label{low_rank_approx}
112: A \approx \tilde{U} \, \tilde{\Sigma} \, \tilde{V}^\T,
113: \end{equation}
114: %
115: where $\tilde{U}$ is the leftmost $m \times k$ block of $U$,
116: $\tilde{V}$ is the leftmost $n \times k$ block of $V$,
117: and $\tilde{\Sigma}$ is the $k \times k$ matrix
118: whose only nonzero entries appear in nonincreasing order on the diagonal
119: and are the $k$ greatest singular values of $A$.
120: This approximation is ``best'' in the sense that
121: the spectral norm $\| A - B \|$ of the difference between $A$
122: and a rank-$k$ matrix $B$ is minimal
123: for $B = \tilde{U} \, \tilde{\Sigma} \, \tilde{V}^\T$.
124: In fact,
125: %
126: \begin{equation}
127: \| A - \tilde{U} \, \tilde{\Sigma} \, \tilde{V}^\T \| = \sigma_{k+1},
128: \end{equation}
129: %
130: where $\sigma_{k+1}$ is the $(k+1)^\st$ greatest singular value of $A$.
131: For more information about the SVD, see, for example,
132: Chapter~8 in~\cite{golub-van_loan}.
133: 
134: For definiteness, let us assume that $m \le n$
135: and that $A$ is an arbitrary (dense) real $m \times n$ matrix.
136: To compute a rank-$k$ approximation to $A$,
137: one might form the matrices $U$, $\Sigma$, and $V$ in~(\ref{full_svd}),
138: and then use them to construct $\tilde{U}$, $\tilde{\Sigma}$, and $\tilde{V}$
139: in~(\ref{low_rank_approx}).
140: However, even computing just $\Sigma$, the leftmost $m$ columns of $U$,
141: and the leftmost $m$ columns of $V$ requires at least
142: $\bigoh(n m^2)$ floating-point operations (flops) using any
143: of the standard algorithms
144: (see, for example, Chapter~8 in~\cite{golub-van_loan}).
145: Alternatively, one might use pivoted $QR$-decomposition algorithms,
146: which require $\bigoh(nmk)$ flops
147: and typically produce a rank-$k$ approximation $B$ to $A$ such that
148: %
149: \begin{equation}
150: \label{gu_bound}
151: \| A - B \| \le 10 \sqrt{m} \; \sigma_{k+1},
152: \end{equation}
153: %
154: where $\|A-B\|$ is the spectral norm of $A-B$,
155: and $\sigma_{k+1}$ is the $(k+1)^\st$ greatest singular value of $A$
156: (see, for example, Chapter~5 in~\cite{golub-van_loan}).
157: Furthermore, the algorithms of~\cite{gu-eisenstat} require only
158: about $\bigoh(nmk)$ flops to produce a rank-$k$ approximation that
159: (unlike an approximation produced by a pivoted $QR$-decomposition)
160: has been guaranteed to satisfy a bound nearly as strong as~(\ref{gu_bound}).
161: 
162: While the accuracy in~(\ref{gu_bound}) is sufficient
163: for many applications of low-rank approximation,
164: PCA often involves $m \ge$ 10,000,
165: and a ``signal-to-noise ratio'' $\sigma_1/\sigma_{k+1} \le 100$,
166: where $\sigma_1 = \|A\|$ is the greatest singular value of $A$,
167: and $\sigma_{k+1}$ is the $(k+1)^\st$ greatest.
168: Moreover, the singular values $\le \sigma_{k+1}$
169: often arise from noise in the process generating the data in $A$,
170: making the singular values of $A$ decay so slowly that
171: $\sigma_m \ge \sigma_{k+1}/10$.
172: When $m \ge$ 10,000, $\sigma_1/\sigma_{k+1} \le 100$,
173: and $\sigma_m \ge \sigma_{k+1}/10$, the rank-$k$ approximation $B$ produced
174: by a pivoted $QR$-decomposition algorithm
175: typically satisfies $\| A - B \| \sim \| A \|$
176: --- the ``approximation'' $B$ is effectively unrelated
177: to the matrix $A$ being approximated!
178: For large matrices whose ``signal-to-noise ratio''
179: $\sigma_1/\sigma_{k+1}$ is less than 10,000,
180: the $\sqrt{m}$ factor in~(\ref{gu_bound}) may be unacceptable.
181: Now, pivoted $QR$-decomposition algorithms are not the only algorithms
182: which can compute a rank-$k$ approximation using $\bigoh(nmk)$ flops.
183: However, other algorithms, such as those of
184: \cite{achlioptas-mcsherry0}, \cite{achlioptas-mcsherry}, \cite{chan-hansen},
185: \cite{clarkson-woodruff}, \cite{deshpande-rademacher-vempala-wang},
186: \cite{deshpande-vempala}, \cite{drineas-drinea-huggins},
187: \cite{drineas-kannan-mahoney2}, \cite{drineas-kannan-mahoney3},
188: \cite{drineas-mahoney-muthukrishnan1}, \cite{drineas-mahoney-muthukrishnan2},
189: \cite{friedland-kaveh-niknejad-zare}, \cite{frieze-kannan},
190: \cite{frieze-kannan-vempala0}, \cite{frieze-kannan-vempala},
191: \cite{goreinov-tyrtyshnikov}, \cite{goreinov-tyrtyshnikov-zamarashkin2},
192: \cite{goreinov-tyrtyshnikov-zamarashkin1}, \cite{gu-eisenstat},
193: \cite{har-peled},
194: \cite{liberty-woolfe-martinsson-rokhlin-tygert}, \cite{mahoney-drineas},
195: \cite{papadimitriou-raghavan-tamaki-vempala},
196: \cite{sarlos3}, \cite{sarlos4}, \cite{sun-xie-zhang-faloutsos},
197: \cite{tyrtyshnikov}, and~\cite{woolfe-liberty-rokhlin-tygert},
198: also yield accuracies involving factors of at least $\sqrt{m}$
199: when the singular values $\sigma_{k+1}$, $\sigma_{k+2}$, $\sigma_{k+3}$, \dots\
200: of $A$ decay slowly.
201: (The decay is rather slow if, for example,
202: $\sigma_{k+j} \sim j^\alpha \, \sigma_{k+1}$
203: for $j = 1$,~$2$,~$3$, \dots, with $-1/2 < \alpha \le 0$.
204: Many of these other algorithms are designed to produce approximations
205: having special properties not treated in the present paper,
206: and their spectral-norm accuracy is good when the singular values decay
207: sufficiently fast. Fairly recent surveys of algorithms
208: for low-rank approximation are available in~\cite{sarlos3}, \cite{sarlos4},
209: and~\cite{liberty-woolfe-martinsson-rokhlin-tygert}.)
210: 
211: The algorithm described in the present paper produces
212: a rank-$k$ approximation $B$ to $A$ such that
213: %
214: \begin{equation}
215: \label{very_rough}
216: \| A - B \| \le C \, m^{1/(4i+2)} \, \sigma_{k+1}
217: \end{equation}
218: %
219: with very high probability (typically $1-10^{-15}$, independent of $A$,
220: with the choice of parameters from Remark~\ref{par_remark} below),
221: where $\|A-B\|$ is the spectral norm of $A-B$,
222: $i$ is a nonnegative integer specified by the user,
223: $\sigma_{k+1}$ is the $(k+1)^\st$ greatest singular value of $A$,
224: and $C$ is a constant independent of $A$
225: that theoretically may depend on the parameters of the algorithm.
226: (Numerical evidence such as that in Section~\ref{numerical}
227: suggests at the very least that $C < 10$;
228: (\ref{explicit_eval}) and~(\ref{the_point})
229: in Section~\ref{algorithm} provide more complicated theoretical bounds on $C$.)
230: The algorithm requires $\bigoh(nmki)$ floating-point operations when $i>0$.
231: In many applications of PCA, $i = 1$ or $i = 2$ is sufficient,
232: and the algorithm then requires only $\bigoh(nmk)$ flops.
233: The algorithm provides the rank-$k$ approximation $B$ in the form of an SVD,
234: outputting three matrices, $\tilde{U}$, $\tilde{\Sigma}$, and $\tilde{V}$,
235: such that $B = \tilde{U} \, \tilde{\Sigma} \, \tilde{V}^\T$,
236: where the columns of $\tilde{U}$ are orthonormal,
237: the columns of $\tilde{V}$ are orthonormal,
238: and the entries of $\tilde{\Sigma}$ are all nonnegative
239: and zero off the diagonal.
240: 
241: The algorithm of the present paper is randomized,
242: but succeeds with very high probability;
243: for example, the bound~(\ref{explicit_eval}) on its accuracy holds
244: with probability greater than $1-10^{-15}$.
245: The algorithm is similar to many recently discussed randomized algorithms
246: for low-rank approximation, but produces approximations of higher accuracy
247: when the singular values $\sigma_{k+1}$, $\sigma_{k+2}$, $\sigma_{k+3}$, \dots\
248: of the matrix being approximated decay slowly; see, for example, \cite{sarlos3}
249: or~\cite{liberty-woolfe-martinsson-rokhlin-tygert}.
250: The algorithm is a variant of that in~\cite{roweis},
251: and the analysis of the present paper should extend to the algorithm
252: of~\cite{roweis}; \cite{roweis} stimulated the authors' collaboration.
253: The algorithm may be regarded as a generalization
254: of the randomized power methods of~\cite{dixon}
255: and~\cite{kuczynski-wozniakowski},
256: and in fact we use the latter to ascertain the approximations' accuracy
257: rapidly and reliably.
258: 
259: The algorithm admits obvious ``out-of-core'' and parallel implementations
260: (assuming that the user chooses the parameter $i$ in~(\ref{very_rough})
261: to be reasonably small).
262: As with the algorithms of~\cite{dixon}, \cite{kuczynski-wozniakowski},
263: \cite{liberty-woolfe-martinsson-rokhlin-tygert},
264: \cite{martinsson-rokhlin-tygert3}, \cite{roweis},
265: \cite{sarlos3}, and~\cite{sarlos4},
266: the core steps of the algorithm of the present paper
267: involve the application of the matrix $A$ being approximated
268: and its transpose $A^\T$ to random vectors.
269: The algorithm is more efficient when $A$ and $A^\T$ can be applied rapidly
270: to arbitrary vectors, such as when $A$ is sparse.
271: 
272: Throughout the present paper, we use $\Id$ to denote an identity matrix.
273: We use $\0s$ to denote a matrix whose entries are all zeros.
274: For any matrix $A$, we use $\|A\|$ to denote the spectral norm of $A$,
275: that is, $\|A\|$ is the greatest singular value of $A$.
276: Furthermore, the entries of all matrices in the present paper are real valued,
277: though the algorithm and analysis extend trivially to matrices
278: whose entries are complex valued.
279: 
280: The present paper has the following structure:
281: Section~\ref{prelims} collects together various known facts
282: which later sections utilize.
283: Section~\ref{apparatus} provides the principal lemmas used in bounding
284: the accuracy of the algorithm in Section~\ref{algorithm}.
285: Section~\ref{algorithm} describes the algorithm of the present paper.
286: Section~\ref{numerical} illustrates the performance of the algorithm
287: via several numerical examples.
288: The appendix, Section~\ref{appendix}, proves two lemmas stated earlier
289: in Section~\ref{apparatus}.
290: We encourage the reader to begin with Sections~\ref{algorithm}
291: and~\ref{numerical}, referring back to the relevant portions
292: of Sections~\ref{prelims} and~\ref{apparatus} as they are referenced.
293: 
294: 
295: 
296: \section{Preliminaries}
297: \label{prelims}
298: 
299: In this section, we summarize various facts about matrices and functions.
300: Subsection~\ref{general_singular_values} discusses the singular values
301: of arbitrary matrices. Subsection~\ref{random_singular_values}
302: discusses the singular values of certain random matrices.
303: Subsection~\ref{monotone} observes that a certain function is monotone.
304: 
305: 
306: \subsection{Singular values of general matrices}
307: \label{general_singular_values}
308: 
309: 
310: The following trivial technical lemma will be needed
311: in Section~\ref{apparatus}.
312: 
313: \begin{lemma}
314: Suppose that $m$ and $n$ are positive integers with $m \ge n$.
315: Suppose further that $A$ is a real $m \times n$ matrix
316: such that the least (that is, the $n^\ith$ greatest) singular value $\sigma_n$
317: of $A$ is nonzero.
318: 
319: Then,
320: %
321: \begin{equation}
322: \label{pseudoinverse_norm}
323: \left\| (A^\T \, A)^{-1} \, A^\T \right\| = \frac{1}{\sigma_n}.
324: \end{equation}
325: %
326: \end{lemma}
327: 
328: 
329: The following lemma states that the greatest singular value of a matrix $A$
330: is at least as large as the greatest singular value
331: of any rectangular block of entries in $A$;
332: the lemma is a straightforward consequence
333: of the minimax properties of singular values
334: (see, for example, Section~47 of Chapter~2 in~\cite{wilkinson}).
335: 
336: \begin{lemma}
337: \label{minimax_consequence}
338: Suppose that $k$, $l$, $m$, and~$n$ are positive integers
339: with $k \le m$ and $l \le n$.
340: Suppose further that $A$ is a real $m \times n$ matrix,
341: and $B$ is a $k \times l$ rectangular block of entries in $A$.
342: 
343: Then, the greatest singular value of $B$ is at most
344: the greatest singular value of $A$.
345: \end{lemma}
346: 
347: 
348: The following classical lemma provides an approximation $Q \, S$
349: to an $n \times l$ matrix $R$
350: via an $n \times k$ matrix $Q$ whose columns are orthonormal,
351: and a $k \times l$ matrix $S$.
352: As remarked in Observation~\ref{least_squares},
353: the proof of this lemma provides a classic algorithm for computing $Q$ and $S$,
354: given $R$. We include the proof since we will be using this algorithm.
355: 
356: \begin{lemma}
357: Suppose that $k$, $l$, and $n$ are positive integers with $k < l \le n$,
358: and $R$ is a real $n \times l$ matrix.
359: 
360: Then, there exist a real $n \times k$ matrix $Q$
361: whose columns are orthonormal,
362: and a real $k \times l$ matrix $S$, such that
363: %
364: \begin{equation}
365: \label{svd_qr}
366: \| Q \, S - R \| \le \rho_{k+1},
367: \end{equation}
368: %
369: where $\rho_{k+1}$ is the $(k+1)^\ist$ greatest singular value of $R$.
370: \end{lemma}
371: 
372: \begin{proof}
373: We start by forming an SVD of $R$,
374: %
375: \begin{equation}
376: \label{little_svd}
377: R = U \, \Sigma \, V^\T,
378: \end{equation}
379: %
380: where $U$ is a real $n \times l$ matrix whose columns are orthonormal,
381: $V$ is a real $l \times l$ matrix whose columns are orthonormal,
382: and $\Sigma$ is a real diagonal $l \times l$ matrix, such that
383: %
384: \begin{equation}
385: \label{little_ordering}
386: \Sigma_{j,j} = \rho_j
387: \end{equation}
388: %
389: for $j = 1$,~$2$, \dots, $l-1$,~$l$,
390: where $\Sigma_{j,j}$ is the entry in row $j$ and column $j$ of $\Sigma$,
391: and $\rho_j$ is the $j^\th$ greatest singular value of $R$.
392: We define $Q$ to be the leftmost $n \times k$ block of $U$,
393: and $P$ to be the rightmost $n \times (l-k)$ block of $U$, so that
394: %
395: \begin{equation}
396: \label{left_sing}
397: U = \left( \begin{array}{c|c} Q & P \end{array} \right).
398: \end{equation}
399: %
400: We define $S$ to be the uppermost $k \times l$ block of $\Sigma \, V^\T$,
401: and $T$ to be the lowermost $(l-k) \times l$ block of $\Sigma \, V^\T$,
402: so that
403: %
404: \begin{equation}
405: \label{right_sing}
406: \Sigma \, V^\T = \left( \begin{array}{c} S \\\hline T \end{array} \right).
407: \end{equation}
408: %
409: Combining~(\ref{little_svd}), (\ref{little_ordering}),
410: (\ref{left_sing}), (\ref{right_sing}),
411: and the fact that the columns of $U$ are orthonormal,
412: as are the columns of $V$, yields~(\ref{svd_qr}).
413: \end{proof}
414: 
415: 
416: \begin{observation}
417: \label{least_squares}
418: In order to compute the matrices $Q$ and $S$ in~(\ref{svd_qr})
419: from the matrix $R$,
420: we can construct~(\ref{little_svd}),
421: and then form $Q$ and $S$
422: according to~(\ref{left_sing}) and~(\ref{right_sing}).
423: (See, for example, Chapter~8 in~\cite{golub-van_loan} for details
424: concerning the computation of the SVD.)
425: \end{observation}
426: 
427: 
428: 
429: \subsection{Singular values of random matrices}
430: \label{random_singular_values}
431: 
432: 
433: The following lemma provides a highly probable upper bound
434: on the greatest singular value
435: of a square matrix whose entries are independent, identically distributed
436: (i.i.d.) Gaussian random variables of zero mean and unit variance;
437: Formula~8.8 in~\cite{goldstine-von_neumann} provides an equivalent formulation
438: of the lemma.
439: 
440: \begin{lemma}
441: \label{greatest_bound}
442: Suppose that $n$ is a positive integer,
443: $G$ is a real $n \times n$ matrix whose entries are
444: i.i.d.\ Gaussian random variables of zero mean and unit variance,
445: and $\gamma$ is a positive real number, such that $\gamma > 1$ and
446: %
447: \begin{equation}
448: \label{failure_prob}
449: 1 - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi n \gamma^2}}
450:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^n
451: \end{equation}
452: %
453: is nonnegative.
454: 
455: Then, the greatest singular value of $G$ is at most $\sqrt{2n} \, \gamma$
456: with probability not less than the amount in~(\ref{failure_prob}).
457: \end{lemma}
458: 
459: 
460: Combining Lemmas~\ref{minimax_consequence} and~\ref{greatest_bound}
461: yields the following lemma,
462: providing a highly probable upper bound on the greatest singular value
463: of a rectangular matrix whose entries are i.i.d.\ Gaussian
464: random variables of zero mean and unit variance.
465: 
466: \begin{lemma}
467: \label{greatest_value}
468: Suppose that $l$, $m$, and $n$ are positive integers
469: with $n \ge l$ and $n \ge m$.
470: Suppose further that $G$ is a real $l \times m$ matrix whose entries are
471: i.i.d.\ Gaussian random variables of zero mean and unit variance,
472: and $\gamma$ is a positive real number, such that
473: $\gamma > 1$ and~(\ref{failure_prob}) is nonnegative.
474: 
475: Then, the greatest singular value of $G$ is at most $\sqrt{2n} \, \gamma$
476: with probability not less than the amount in~(\ref{failure_prob}).
477: \end{lemma}
478: 
479: 
480: The following lemma provides a highly probable lower bound
481: on the least singular value
482: of a rectangular matrix whose entries are i.i.d.\ Gaussian
483: random variables of zero mean and unit variance;
484: Formula~2.5 in~\cite{chen-dongarra}
485: and the proof of Lemma~4.1 in~\cite{chen-dongarra}
486: together provide an equivalent formulation of Lemma~\ref{least_value}.
487: 
488: \begin{lemma}
489: \label{least_value}
490: Suppose that $j$ and $l$ are positive integers with $j \le l$.
491: Suppose further that $G$ is a real $l \times j$ matrix whose entries are
492: i.i.d.\ Gaussian random variables of zero mean and unit variance,
493: and $\beta$ is a positive real number, such that
494: %
495: \begin{equation}
496: \label{failure_prob2}
497: 1 - \frac{1}{\sqrt{2 \pi \, (l-j+1)}}
498:  \, \left( \frac{e}{(l-j+1) \, \beta} \right)^{l-j+1}
499: \end{equation}
500: %
501: is nonnegative.
502: 
503: Then, the least (that is, the $j^\ith$ greatest) singular value
504: of $G$ is at least $1 / (\sqrt{l} \; \beta)$
505: with probability not less than the amount in~(\ref{failure_prob2}).
506: \end{lemma}
507: 
508: 
509: 
510: \subsection{A monotone function}
511: \label{monotone}
512: 
513: 
514: The following technical lemma will be needed
515: in Section~\ref{algorithm}.
516: 
517: \begin{lemma}
518: \label{monotonicity}
519: Suppose that $\alpha$ is a nonnegative real number,
520: and $f$ is the function defined on $(0,\infty)$ via the formula
521: %
522: \begin{equation}
523: f(x) = \frac{1}{\sqrt{2 \pi x}} \left( \frac{e\alpha}{x} \right)^x.
524: \end{equation}
525: 
526: Then, $f$ decreases monotonically for $x > \alpha$.
527: \end{lemma}
528: 
529: \begin{proof}
530: The derivative of $f$ is
531: %
532: \begin{equation}
533: \label{derivative}
534: f'(x) = f(x) \left( \ln\left(\frac{\alpha}{x}\right) - \frac{1}{2x} \right)
535: \end{equation}
536: %
537: for any positive real number $x$.
538: The right-hand side of~(\ref{derivative}) is negative when $x > \alpha$.
539: \end{proof}
540: 
541: 
542: 
543: \section{Mathematical apparatus}
544: \label{apparatus}
545: 
546: In this section, we provide lemmas to be used in Section~\ref{algorithm}
547: in bounding the accuracy of the algorithm of the present paper.
548: 
549: The following lemma, proven in the appendix (Section~\ref{appendix}),
550: shows that the product $A \, Q \, Q^\T$
551: of matrices $A$, $Q$, and $Q^\T$
552: is a good approximation to a matrix $A$,
553: provided that there exist matrices $G$ and $S$ such that
554: %
555: \begin{enumerate}
556: %
557: \item[1.] the columns of $Q$ are orthonormal,
558: %
559: \item[2.] $Q \, S$ is a good approximation to $(G \, (A \, A^\T)^i \, A)^\T$,
560: and
561: %
562: \item[3.] there exists a matrix $F$ such that $\| F \|$ is not too large,
563: and $F \, G \, (A \, A^\T)^i \, A$ is a good approximation to $A$.
564: %
565: \end{enumerate}
566: 
567: \begin{lemma}
568: \label{all_together2}
569: Suppose that $i$, $k$, $l$, $m$, and~$n$ are positive integers
570: with $k \le l \le m \le n$.
571: Suppose further that $A$ is a real $m \times n$ matrix,
572: $Q$ is a real $n \times k$ matrix whose columns are orthonormal,
573: $S$ is a real $k \times l$ matrix,
574: $F$ is a real $m \times l$ matrix,
575: and $G$ is a real $l \times m$ matrix.
576: 
577: Then,
578: %
579: \begin{equation}
580: \label{reconstruction2}
581: \| A \, Q \, Q^\T - A \|
582: \le 2 \, \| F \, G \, (A \, A^\T)^i \, A - A \|
583:   + 2 \, \| F \| \, \| Q \, S - (G \, (A \, A^\T)^i \, A)^\T \|.
584: \end{equation}
585: %
586: \end{lemma}
587: 
588: 
589: The following lemma, proven in the appendix (Section~\ref{appendix}),
590: states that,
591: for any positive integer $i$, matrix $A$, and matrix $G$ whose entries are
592: i.i.d.\ Gaussian random variables of zero mean and unit variance,
593: with very high probability there exists a matrix $F$
594: with a reasonably small norm,
595: such that $F \, G \, (A \, A^\T)^i \, A$ is a good approximation to $A$.
596: This lemma is similar to Lemma~19 of~\cite{martinsson-rokhlin-tygert3}.
597: 
598: \begin{lemma}
599: \label{probability_bounds2}
600: Suppose that $i$, $j$, $k$, $l$, $m$, and~$n$ are positive integers
601: with $j < k < l < m \le n$.
602: Suppose further that $A$ is a real $m \times n$ matrix,
603: $G$ is a real $l \times m$ matrix whose entries are
604: i.i.d.\ Gaussian random variables of zero mean and unit variance,
605: and $\beta$ and $\gamma$ are positive real numbers, such that
606: the $j^\ith$ greatest singular value $\sigma_j$ of $A$ is positive,
607: $\gamma > 1$, and
608: %
609: \begin{multline}
610: \label{probability2}
611: \Phi
612:   = 1 - \frac{1}{\sqrt{2 \pi \, (l-j+1)}}
613:  \, \left( \frac{e}{(l-j+1) \, \beta} \right)^{l-j+1} \\
614:   - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, \max(m-k,l) \; \gamma^2}}
615:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^{\max(m-k,\,l)} \\
616:   - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, l \, \gamma^2}}
617:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^l
618: \end{multline}
619: %
620: is nonnegative.
621: 
622: Then, there exists a real $m \times l$ matrix $F$ such that
623: %
624: \begin{multline}
625: \label{approximation2}
626: \| F \, G \, (A \, A^\T)^i \, A - A \|
627: \le \sqrt{ 2 l^2 \, \beta^2 \, \gamma^2 + 1 }
628:  \;\; \sigma_{j+1} \\
629:   + \sqrt{ 2 l \, \max(m-k,l) \, \beta^2 \, \gamma^2
630:         \, \left( \frac{\sigma_{k+1}}{\sigma_j} \right)^{4i} + 1 }
631:  \;\; \sigma_{k+1}
632: \end{multline}
633: %
634: and
635: %
636: \begin{equation}
637: \label{small_norm2}
638: \| F \| \le \frac{\sqrt{l} \; \beta}{(\sigma_j)^{2i}}
639: \end{equation}
640: %
641: with probability not less than $\Phi$ defined in~(\ref{probability2}),
642: where $\sigma_j$ is the $j^\ith$ greatest singular value of $A$,
643: $\sigma_{j+1}$ is the $(j+1)^\ist$ greatest singular value of $A$,
644: and $\sigma_{k+1}$ is the $(k+1)^\ist$ greatest singular value of $A$.
645: \end{lemma}
646: 
647: 
648: Given a matrix $A$,
649: and a matrix $G$ whose entries are i.i.d.\ Gaussian random variables
650: of zero mean and unit variance,
651: the following lemma provides a highly probable upper bound
652: on the singular values of the product $G \, A$
653: in terms of the singular values of $A$.
654: This lemma is reproduced from~\cite{martinsson-rokhlin-tygert3},
655: where it appears as Lemma~20.
656: 
657: \begin{lemma}
658: \label{singular_value_stretching}
659: Suppose that $j$, $k$, $l$, $m$, and~$n$ are positive integers
660: with $k < l$, such that $k + j < m$ and $k + j < n$.
661: Suppose further that $A$ is a real $m \times n$ matrix,
662: $G$ is a real $l \times m$ matrix whose entries are
663: i.i.d.\ Gaussian random variables of zero mean and unit variance,
664: and $\gamma$ is a positive real number, such that
665: $\gamma > 1$ and
666: %
667: \begin{multline}
668: \label{probability3}
669: \Xi
670:   = 1 - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, \max(m-k-j,l) \, \gamma^2}}
671:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^{\max(m-k-j,\,l)} \\
672:   - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, \max(k+j,l) \; \gamma^2}}
673:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^{\max(k+j,\,l)}
674: \end{multline}
675: %
676: is nonnegative.
677: 
678: Then,
679: %
680: \begin{equation}
681: \label{stretched_singular_value}
682: \rho_{k+1} \le \sqrt{2 \, \max(k+j,l)} \; \gamma \; \sigma_{k+1}
683:              + \sqrt{2 \, \max(m-k-j,l)} \; \gamma \; \sigma_{k+j+1}
684: \end{equation}
685: %
686: with probability not less than $\Xi$ defined in~(\ref{probability3}),
687: where $\rho_{k+1}$ is the $(k+1)^\ist$ greatest singular value of $G \, A$,
688: $\sigma_{k+1}$ is the $(k+1)^\ist$ greatest singular value of $A$,
689: and $\sigma_{k+j+1}$ is the $(k+j+1)^\ist$ greatest singular value of $A$.
690: \end{lemma}
691: 
692: 
693: The following corollary follows immediately from the preceding lemma,
694: by replacing the matrix $A$ with $(A \, A^\T)^i \, A$,
695: the integer $k$ with $j$, and the integer $j$ with $k-j$.
696: 
697: \begin{corollary}
698: \label{singular_value_stretching2}
699: Suppose $i$, $j$, $k$, $l$, $m$, and~$n$ are positive integers
700: with $j < k < l < m \le n$.
701: Suppose further that $A$ is a real $m \times n$ matrix,
702: $G$ is a real $l \times m$ matrix whose entries are
703: i.i.d.\ Gaussian random variables of zero mean and unit variance,
704: and $\gamma$ is a positive real number, such that
705: $\gamma > 1$ and
706: %
707: \begin{multline}
708: \label{probability32}
709: \Psi
710:   = 1 - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, \max(m-k,l) \, \gamma^2}}
711:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^{\max(m-k,\,l)} \\
712:   - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, l \; \gamma^2}}
713:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^l
714: \end{multline}
715: %
716: is nonnegative.
717: 
718: Then,
719: %
720: \begin{equation}
721: \label{stretched_singular_value2}
722: \rho_{j+1} \le \sqrt{2 l} \; \gamma \; (\sigma_{j+1})^{2i+1}
723:              + \sqrt{2 \, \max(m-k,l)} \; \gamma \; (\sigma_{k+1})^{2i+1}
724: \end{equation}
725: %
726: with probability not less than $\Psi$ defined in~(\ref{probability32}),
727: where $\rho_{j+1}$ is the $(j+1)^\ist$ greatest singular value
728: of $G \, (A \, A^\T)^i \, A$,
729: $\sigma_{j+1}$ is the $(j+1)^\ist$ greatest singular value of $A$,
730: and $\sigma_{k+1}$ is the $(k+1)^\ist$ greatest singular value of $A$.
731: \end{corollary}
732: 
733: 
734: 
735: \section{Description of the algorithm}
736: \label{algorithm}
737: 
738: In this section, we describe the algorithm of the present paper,
739: providing details about its accuracy and computational costs.
740: Subsection~\ref{main_algorithm} describes the basic algorithm.
741: Subsection~\ref{costs} tabulates the computational costs of the algorithm.
742: Subsection~\ref{modified} describes a complementary algorithm.
743: Subsection~\ref{blanczos} describes a computationally more expensive variant
744: that is somewhat more accurate and tolerant to roundoff.
745: 
746: 
747: 
748: \subsection{The algorithm}
749: \label{main_algorithm}
750: 
751: Suppose that $i$, $k$, $m$, and $n$ are positive integers
752: with $2k < m \le n$, and $A$ is a real $m \times n$ matrix.
753: In this subsection, we will construct an approximation to an SVD of $A$
754: such that
755: %
756: \begin{equation}
757: \label{sort_of_svd}
758: \| A - U \, \Sigma \, V^\T \| \le C \, m^{1/(4i+2)} \, \sigma_{k+1}
759: \end{equation}
760: %
761: with very high probability,
762: where $U$ is a real $m \times k$ matrix
763: whose columns are orthonormal,
764: $V$ is a real $n \times k$ matrix whose columns are orthonormal,
765: $\Sigma$ is a real diagonal $k \times k$ matrix
766: whose entries are all nonnegative,
767: $\sigma_{k+1}$ is the $(k+1)^\st$ greatest singular value of $A$,
768: and $C$ is a constant independent of $A$ that depends on the parameters
769: of the algorithm.
770: (Section~\ref{numerical} will give an empirical indication of the size of $C$,
771: and~(\ref{explicit_eval}) will give one of our best theoretical estimates
772: to date.)
773: 
774: Intuitively, we could apply $A^\T$ to several random vectors,
775: in order to identify the part of its range corresponding
776: to the larger singular values.
777: To enhance the decay of the singular values,
778: we apply $A^\T \, (A \, A^\T)^i$ instead.
779: Once we have identified most of the range of $A^\T$,
780: we perform several linear-algebraic manipulations in order to recover
781: an approximation to $A$.
782: (It is possible to obtain a similar, somewhat less accurate algorithm
783: by substituting our short, fat matrix $A$ for $A^\T$, and $A^\T$ for $A$.)
784: 
785: More precisely, we choose an integer $l > k$ such that $l \le m-k$
786: (for example, $l = k + 12$), and make the following five steps:
787: 
788: \begin{enumerate}
789: %
790: \item[1.] Using a random number generator,
791: form a real $l \times m$ matrix $G$ whose entries are
792: i.i.d.\ Gaussian random variables of zero mean and unit variance,
793: and compute the $l \times n$ product matrix
794: %
795: \begin{equation}
796: \label{product2}
797: R = G \, (A \, A^\T)^i \, A.
798: \end{equation}
799: %
800: \item[2.] Using an SVD,
801: form a real $n \times k$ matrix $Q$ whose columns are orthonormal,
802: such that there exists a real $k \times l$ matrix $S$ for which
803: %
804: \begin{equation}
805: \label{good_approx2}
806: \| Q \, S - R^\T \| \le \rho_{k+1},
807: \end{equation}
808: %
809: where $\rho_{k+1}$ is the $(k+1)^\st$ greatest singular value of $R$.
810: (See Observation~\ref{least_squares} for details concerning
811: the construction of such a matrix $Q$.)
812: %
813: \item[3.] Compute the $m \times k$ product matrix
814: %
815: \begin{equation}
816: \label{product_t}
817: T = A \, Q.
818: \end{equation}
819: %
820: \item[4.] Form an SVD of $T$,
821: %
822: \begin{equation}
823: \label{svd_small}
824: T = U \, \Sigma \, W^\T,
825: \end{equation}
826: %
827: where $U$ is a real $m \times k$ matrix whose columns are orthonormal,
828: $W$ is a real $k \times k$ matrix whose columns are orthonormal,
829: and $\Sigma$ is a real diagonal $k \times k$ matrix
830: whose entries are all nonnegative.
831: (See, for example, Chapter~8 in~\cite{golub-van_loan} for details
832: concerning the construction of such an SVD.)
833: %
834: \item[5.] Compute the $n \times k$ product matrix
835: %
836: \begin{equation}
837: \label{product3}
838: V = Q \, W.
839: \end{equation}
840: %
841: \end{enumerate}
842: 
843: 
844: The following theorem states precisely
845: that the matrices $U$, $\Sigma$, and $V$ satisfy~(\ref{sort_of_svd}).
846: See~(\ref{explicit_eval}) for a more compact (but less general) formulation.
847: 
848: \begin{theorem}
849: \label{the_theorem}
850: Suppose that $i$, $k$, $l$, $m$, and $n$ are positive integers
851: with $k < l \le m-k$ and $m \le n$, and $A$ is a real $m \times n$ matrix.
852: Suppose further that $\beta$ and $\gamma$ are positive real numbers
853: such that $\gamma>1$,
854: %
855: \begin{equation}
856: \label{monotonicity_assump}
857: (l-k+1) \, \beta \ge 1,
858: \end{equation}
859: %
860: \begin{equation}
861: \label{simplifying_assump}
862: 2 \, l^2 \, \gamma^2 \, \beta^2 \ge 1,
863: \end{equation}
864: %
865: and
866: %
867: \begin{multline}
868: \label{final_prob}
869: \Pi = 1 - \frac{1}{2 \, (\gamma^2-1) \, \sqrt{\pi \, (m-k) \, \gamma^2}}
870:       \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^{m-k}
871:     - \frac{1}{2 \, (\gamma^2-1) \, \sqrt{\pi \, l \; \gamma^2}}
872:       \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^l \\
873:     - \frac{1}{\sqrt{2 \pi \, (l-k+1)}}
874:    \, \left( \frac{e}{(l-k+1) \, \beta} \right)^{l-k+1}
875: \end{multline}
876: %
877: is nonnegative.
878: Suppose in addition that $U$, $\Sigma$, and $V$ are the matrices
879: produced via the five-step algorithm of the present subsection, given above.
880: 
881: Then,
882: %
883: \begin{equation}
884: \label{the_point}
885: \| A - U \, \Sigma \, V^\T \| \le 16 \, \gamma \, \beta \, l
886: \, \left(\frac{m-k}{l}\right)^{1/(4i+2)} \, \sigma_{k+1}
887: \end{equation}
888: %
889: with probability not less than $\Pi$,
890: where $\Pi$ is defined in~(\ref{final_prob}),
891: and $\sigma_{k+1}$ is the $(k+1)^\ist$ greatest singular value of $A$.
892: \end{theorem}
893: 
894: \begin{proof}
895: Observing that $U \, \Sigma \, V^\T = A \, Q \, Q^\T$,
896: it is sufficient to prove that
897: %
898: \begin{equation}
899: \label{intermediate_step}
900: \| A \, Q \, Q^\T - A \| \le 16 \, \gamma \, \beta \, l
901: \, \left(\frac{m-k}{l}\right)^{1/(4i+2)} \, \sigma_{k+1}
902: \end{equation}
903: %
904: with probability $\Pi$,
905: where $Q$ is the matrix from~(\ref{good_approx2}),
906: since combining~(\ref{intermediate_step}), (\ref{product_t}),
907: (\ref{svd_small}), and~(\ref{product3}) yields~(\ref{the_point}).
908: We now prove~(\ref{intermediate_step}).
909: 
910: First, we consider the case when
911: %
912: \begin{equation}
913: \label{first_case}
914: \| A \| \le \left(\frac{m-k}{l}\right)^{1/(4i+2)} \, \sigma_{k+1}.
915: \end{equation}
916: %
917: Clearly,
918: %
919: \begin{equation}
920: \label{triangle_submult}
921: \| A \, Q \, Q^\T - A \| \le \| A \| \, \| Q \| \, \| Q^\T \| + \| A \|.
922: \end{equation}
923: %
924: But, it follows from the fact that the columns of $Q$ are orthonormal that
925: %
926: \begin{equation}
927: \label{normortho1}
928: \| Q \| \le 1
929: \end{equation}
930: %
931: and
932: %
933: \begin{equation}
934: \label{normortho2}
935: \| Q^\T \| \le 1.
936: \end{equation}
937: %
938: Combining~(\ref{triangle_submult}), (\ref{normortho1}), (\ref{normortho2}),
939: (\ref{first_case}), and~(\ref{simplifying_assump})
940: yields~(\ref{intermediate_step}), completing the proof
941: for the case when~(\ref{first_case}) holds.
942: 
943: For the remainder of the proof, we consider the case when
944: %
945: \begin{equation}
946: \label{second_case}
947: \| A \| > \left(\frac{m-k}{l}\right)^{1/(4i+2)} \, \sigma_{k+1}.
948: \end{equation}
949: %
950: To prove~(\ref{intermediate_step}),
951: we will use~(\ref{reconstruction2})
952: (which is restated and proven in Lemma~\ref{all_together22} in the appendix),
953: namely,
954: %
955: \begin{equation}
956: \label{basic_bound}
957: \| A \, Q \, Q^\T - A \|
958: \le 2 \, \| F \, G \, (A \, A^\T)^i \, A - A \|
959:   + 2 \, \| F \| \, \| Q \, S - (G \, (A \, A^\T)^i \, A)^\T \|
960: \end{equation}
961: %
962: for any real $m \times l$ matrix $F$,
963: where $G$ is from~(\ref{product2}),
964: and $Q$ and $S$ are from~(\ref{good_approx2}).
965: We now choose an appropriate matrix $F$.
966: 
967: First, we define $j$ to be the positive integer such that
968: %
969: \begin{equation}
970: \label{reduced_rank}
971: \sigma_{j+1} \le \left(\frac{m-k}{l}\right)^{1/(4i+2)} \, \sigma_{k+1}
972:                < \sigma_j,
973: \end{equation}
974: %
975: where $\sigma_j$ is the $j^\th$ greatest singular value of $A$,
976: and $\sigma_{j+1}$ is the $(j+1)^\st$ greatest
977: (such an integer $j$ exists due to~(\ref{second_case})
978: and the supposition of the theorem that $l \le m-k$).
979: We then use the matrix $F$ from~(\ref{approximation2})
980: and~(\ref{small_norm2}) associated with this integer $j$, so that
981: (as stated in~(\ref{approximation2}) and~(\ref{small_norm2}),
982: which are restated and proven in Lemma~\ref{probability_bounds22}
983: in the appendix)
984: %
985: \begin{multline}
986: \label{number1}
987: \| F \, G \, (A \, A^\T)^i \, A - A \|
988: \le \sqrt{ 2 l^2 \, \beta^2 \, \gamma^2 + 1 }
989:  \;\; \sigma_{j+1} \\
990:   + \sqrt{ 2 l \, \max(m-k,l) \, \beta^2 \, \gamma^2
991:         \, \left( \frac{\sigma_{k+1}}{\sigma_j} \right)^{4i} + 1 }
992:  \;\; \sigma_{k+1}
993: \end{multline}
994: %
995: and
996: %
997: \begin{equation}
998: \label{number2}
999: \| F \| \le \frac{\sqrt{l} \; \beta}{(\sigma_j)^{2i}}
1000: \end{equation}
1001: %
1002: with probability not less than $\Phi$ defined in~(\ref{probability2}).
1003: Formula~(\ref{number1}) bounds the first term in the right-hand side
1004: of~(\ref{basic_bound}).
1005: 
1006: To bound the second term in the right-hand side of~(\ref{basic_bound}),
1007: we observe that $j \le k$, due to~(\ref{reduced_rank})
1008: and the supposition of the theorem that $l \le m-k$.
1009: Combining~(\ref{good_approx2}), (\ref{product2}),
1010: (\ref{stretched_singular_value2}), and the fact that $j \le k$ yields
1011: %
1012: \begin{equation}
1013: \label{number3}
1014: \| Q \, S - (G \, (A \, A^\T)^i \, A)^\T \|
1015: \le \sqrt{2 l} \; \gamma \; (\sigma_{j+1})^{2i+1}
1016:   + \sqrt{2 \, \max(m-k,l)} \; \gamma \; (\sigma_{k+1})^{2i+1}
1017: \end{equation}
1018: %
1019: with probability not less than $\Psi$ defined in~(\ref{probability32}).
1020: %
1021: Combining~(\ref{number2}) and~(\ref{number3}) yields
1022: %
1023: \begin{multline}
1024: \label{number4}
1025: \| F \| \, \| Q \, S - (G \, (A \, A^\T)^i \, A)^\T \|
1026: \le \sqrt{2 \, l^2 \, \gamma^2 \, \beta^2} \; \sigma_{j+1} \\
1027:   + \sqrt{2 \, l \, \max(m-k,l) \, \gamma^2 \, \beta^2
1028:  \, \left(\frac{\sigma_{k+1}}{\sigma_j}\right)^{4i}} \;\; \sigma_{k+1}
1029: \end{multline}
1030: %
1031: with probability not less than $\Pi$ defined in~(\ref{final_prob}).
1032: The combination of Lemma~\ref{monotonicity}, (\ref{monotonicity_assump}),
1033: and the fact that $j \le k$ justifies the use of $k$
1034: (rather than the $j$ used in~(\ref{probability2}) for $\Phi$)
1035: in the last term in the right-hand side of~(\ref{final_prob}).
1036: 
1037: Combining~(\ref{basic_bound}), (\ref{number1}), (\ref{number4}),
1038: (\ref{reduced_rank}), (\ref{simplifying_assump}),
1039: and the supposition of the theorem that $l \le m-k$
1040: yields~(\ref{intermediate_step}), completing the proof.
1041: \end{proof}
1042: 
1043: \begin{remark}
1044: \label{par_remark}
1045: Choosing~$l=k+12$, $\beta = 2.57$, and $\gamma = 2.43$ in~(\ref{final_prob})
1046: and~(\ref{the_point}) yields
1047: %
1048: \begin{equation}
1049: \label{explicit_eval}
1050: \| A - U \, \Sigma \, V^\T \| \le 100 \, l
1051: \, \left(\frac{m-k}{l}\right)^{1/(4i+2)} \, \sigma_{k+1}
1052: \end{equation}
1053: %
1054: with probability greater than $1-10^{-15}$,
1055: where $\sigma_{k+1}$ is the $(k+1)^\st$ greatest singular value of $A$.
1056: Numerical experiments (some of which are reported in Section~\ref{numerical})
1057: indicate that the factor $100 l$ in the right-hand side
1058: of~(\ref{explicit_eval}) is much greater than necessary.
1059: \end{remark}
1060: 
1061: 
1062: \begin{remark}
1063: \label{six-step}
1064: Above, we permit $l$ to be any integer greater than $k$.
1065: Stronger theoretical bounds on the accuracy are available when $l \ge 2k$.
1066: Indeed, via an analysis similar to the proof of Theorem~\ref{the_theorem}
1067: (using in addition the result stated in the abstract of~\cite{chen-dongarra}),
1068: it can be shown that the following six-step algorithm with $l \ge 2k$
1069: produces matrices $U$, $\Sigma$, and $V$ satisfying the bound~(\ref{the_point})
1070: with its right-hand side reduced by a factor of $\sqrt{l}$:
1071: %
1072: \begin{enumerate}
1073: %
1074: \item[1.] Using a random number generator,
1075: form a real $l \times m$ matrix $G$ whose entries are
1076: i.i.d.\ Gaussian random variables of zero mean and unit variance,
1077: and compute the $l \times n$ product matrix
1078: %
1079: \begin{equation}
1080: \label{product2a}
1081: R = G \, (A \, A^\T)^i \, A.
1082: \end{equation}
1083: %
1084: \item[2.] Using a pivoted $QR$-decomposition algorithm,
1085: form a real $n \times l$ matrix $Q$ whose columns are orthonormal,
1086: such that there exists a real $l \times l$ matrix $S$ for which
1087: %
1088: \begin{equation}
1089: \label{good_approx2a}
1090: R^\T = Q \, S.
1091: \end{equation}
1092: %
1093: (See, for example, Chapter~5 in~\cite{golub-van_loan} for details concerning
1094: the construction of such a matrix $Q$.)
1095: %
1096: \item[3.] Compute the $m \times l$ product matrix
1097: %
1098: \begin{equation}
1099: \label{product_ta}
1100: T = A \, Q.
1101: \end{equation}
1102: %
1103: \item[4.] Form an SVD of $T$,
1104: %
1105: \begin{equation}
1106: \label{svd_smalla}
1107: T = \tilde{U} \, \tilde{\Sigma} \, W^\T,
1108: \end{equation}
1109: %
1110: where $\tilde{U}$ is a real $m \times l$ matrix whose columns are orthonormal,
1111: $W$ is a real $l \times l$ matrix whose columns are orthonormal,
1112: and $\tilde{\Sigma}$ is a real diagonal $l \times l$ matrix
1113: whose only nonzero entries are nonnegative and appear in nonincreasing order
1114: on the diagonal.
1115: (See, for example, Chapter~8 in~\cite{golub-van_loan} for details
1116: concerning the construction of such an SVD.)
1117: %
1118: \item[5.] Compute the $n \times l$ product matrix
1119: %
1120: \begin{equation}
1121: \label{product3a}
1122: \tilde{V} = Q \, W.
1123: \end{equation}
1124: %
1125: \item[6.] Extract the leftmost $m \times k$ block $U$ of $\tilde{U}$,
1126: the leftmost $n \times k$ block $V$ of $\tilde{V}$,
1127: and the leftmost uppermost $k \times k$ block $\Sigma$ of $\tilde{\Sigma}$.
1128: %
1129: \end{enumerate}
1130: %
1131: \end{remark}
1132: 
1133: 
1134: 
1135: \subsection{Computational costs}
1136: \label{costs}
1137: 
1138: In this subsection, we tabulate the number of floating-point operations
1139: required by the five-step algorithm described
1140: in Subsection~\ref{main_algorithm} as applied once to a matrix $A$.
1141: 
1142: The algorithm incurs the following costs
1143: in order to compute an approximation to an SVD of $A$:
1144: %
1145: \begin{enumerate}
1146: %
1147: \item[1.] Forming $R$ in~(\ref{product2}) requires applying $A$
1148:           to $il$ column vectors, and $A^\T$ to $(i+1) \, l$ column vectors.
1149: %
1150: \item[2.] Computing $Q$ in~(\ref{good_approx2})
1151:           costs~$\bigoh(l^2 \, n)$.
1152: %
1153: \item[3.] Forming $T$ in~(\ref{product_t}) requires applying $A$
1154:           to $k$ column vectors.
1155: %
1156: \item[4.] Computing the SVD~(\ref{svd_small}) of $T$ costs~$\bigoh(k^2 \, m)$.
1157: %
1158: \item[5.] Forming $V$ in~(\ref{product3}) costs~$\bigoh(k^2 \, n)$.
1159: %
1160: \end{enumerate}
1161: %
1162: Summing up the costs in Steps 1--5 above,
1163: and using the fact that $k \le l \le m \le n$,
1164: we conclude that the algorithm of Subsection~\ref{main_algorithm} costs
1165: %
1166: \begin{equation}
1167: \label{svd_costs}
1168: C_{\rm PCA} = (il+k) \cdot C_A + (il+l) \cdot C_{A^\tinyT} + \bigoh(l^2 \, n)
1169: \end{equation}
1170: %
1171: floating-point operations,
1172: where $C_A$ is the cost of applying $A$ to a real $n \times 1$ column vector,
1173: and $C_{A^\tinyT}$ is the cost of applying $A^\T$
1174: to a real $m \times 1$ column vector.
1175: 
1176: \begin{remark}
1177: We observe that the algorithm
1178: only requires applying $A$ to $il+k$ vectors and $A^\T$ to $il+l$ vectors;
1179: it does not require explicit access to the individual entries of $A$.
1180: This consideration can be important when $A$ and $A^\T$ are available solely
1181: in the form of procedures for their applications to arbitrary vectors.
1182: Often such procedures for applying $A$ and $A^\T$ cost much less than
1183: the standard procedure for applying a dense matrix to a vector.
1184: \end{remark}
1185: 
1186: 
1187: 
1188: \subsection{A modified algorithm}
1189: \label{modified}
1190: 
1191: In this subsection, we describe a simple modification
1192: of the algorithm described in Subsection~\ref{main_algorithm}.
1193: Again, suppose that $i$, $k$, $l$, $m$, and $n$ are positive integers
1194: with $k < l \le m-k$ and $m \le n$, and $A$ is a real $m \times n$ matrix.
1195: Then, the following five-step algorithm constructs an approximation
1196: to an SVD of $A^\T$ such that
1197: %
1198: \begin{equation}
1199: \label{sort_of_svdmod}
1200: \| A^\T - U \, \Sigma \, V^\T \| \le C \, m^{1/(4i)} \, \sigma_{k+1}
1201: \end{equation}
1202: %
1203: with very high probability,
1204: where $U$ is a real $n \times k$ matrix whose columns are orthonormal,
1205: $V$ is a real $m \times k$ matrix whose columns are orthonormal,
1206: $\Sigma$ is a real diagonal $k \times k$ matrix
1207: whose entries are all nonnegative,
1208: $\sigma_{k+1}$ is the $(k+1)^\st$ greatest singular value of $A$,
1209: and $C$ is a constant independent of $A$ that depends on the parameters
1210: of the algorithm:
1211: 
1212: \begin{enumerate}
1213: %
1214: \item[1.] Using a random number generator,
1215: form a real $l \times m$ matrix $G$ whose entries are
1216: i.i.d.\ Gaussian random variables of zero mean and unit variance,
1217: and compute the $l \times m$ product matrix
1218: %
1219: \begin{equation}
1220: \label{product2mod}
1221: R = G \, (A \, A^\T)^i.
1222: \end{equation}
1223: %
1224: \item[2.] Using an SVD,
1225: form a real $m \times k$ matrix $Q$ whose columns are orthonormal,
1226: such that there exists a real $k \times l$ matrix $S$ for which
1227: %
1228: \begin{equation}
1229: \label{good_approx2mod}
1230: \| Q \, S - R^\T \| \le \rho_{k+1},
1231: \end{equation}
1232: %
1233: where $\rho_{k+1}$ is the $(k+1)^\st$ greatest singular value of $R$.
1234: (See Observation~\ref{least_squares} for details concerning
1235: the construction of such a matrix $Q$.)
1236: %
1237: \item[3.] Compute the $n \times k$ product matrix
1238: %
1239: \begin{equation}
1240: \label{product_tmod}
1241: T = A^\T \, Q.
1242: \end{equation}
1243: %
1244: \item[4.] Form an SVD of $T$,
1245: %
1246: \begin{equation}
1247: \label{svd_smallmod}
1248: T = U \, \Sigma \, W^\T,
1249: \end{equation}
1250: %
1251: where $U$ is a real $n \times k$ matrix whose columns are orthonormal,
1252: $W$ is a real $k \times k$ matrix whose columns are orthonormal,
1253: and $\Sigma$ is a real diagonal $k \times k$ matrix
1254: whose entries are all nonnegative.
1255: (See, for example, Chapter~8 in~\cite{golub-van_loan} for details
1256: concerning the construction of such an SVD.)
1257: %
1258: \item[5.] Compute the $m \times k$ product matrix
1259: %
1260: \begin{equation}
1261: \label{product3mod}
1262: V = Q \, W.
1263: \end{equation}
1264: %
1265: \end{enumerate}
1266: 
1267: Clearly, (\ref{sort_of_svdmod}) is similar to~(\ref{sort_of_svd}),
1268: as~(\ref{product2mod}) is similar to~(\ref{product2}).
1269: 
1270: \begin{remark}
1271: The ideas of Remark~\ref{six-step}
1272: are obviously relevant to the algorithm of the present subsection, too.
1273: \end{remark}
1274: 
1275: 
1276: 
1277: \subsection{Blanczos}
1278: \label{blanczos}
1279: 
1280: In this subsection, we describe a modification of the algorithm
1281: of Subsection~\ref{main_algorithm}, enhancing the accuracy
1282: at a little extra computational expense.
1283: Suppose that $i$, $k$, $l$, $m$, and $n$ are positive integers
1284: with $k < l$ and $(i+1)l \le m-k$, and $A$ is a real $m \times n$ matrix,
1285: such that $m \le n$.
1286: Then, the following five-step algorithm constructs an approximation
1287: $U \, \Sigma \, V^\T$ to an SVD of $A$:
1288: 
1289: \begin{enumerate}
1290: %
1291: \item[1.] Using a random number generator,
1292: form a real $l \times m$ matrix $G$ whose entries are
1293: i.i.d.\ Gaussian random variables of zero mean and unit variance,
1294: and compute the $l \times n$ matrices
1295: $R^{(0)}$, $R^{(1)}$, \dots, $R^{(i-1)}$, $R^{(i)}$
1296: defined via the formulae
1297: %
1298: \begin{equation}
1299: R^{(0)} = G \, A,
1300: \end{equation}
1301: %
1302: \begin{equation}
1303: R^{(1)} = R^{(0)} \, A^T \, A,
1304: \end{equation}
1305: %
1306: \begin{equation}
1307: R^{(2)} = R^{(1)} \, A^T \, A,
1308: \end{equation}
1309: %
1310: \begin{equation*}
1311: \vdots
1312: \end{equation*}
1313: %
1314: \begin{equation}
1315: R^{(i-1)} = R^{(i-2)} \, A^T \, A,
1316: \end{equation}
1317: %
1318: \begin{equation}
1319: R^{(i)} = R^{(i-1)} \, A^T \, A.
1320: \end{equation}
1321: %
1322: Form the $((i+1)l) \times n$ matrix
1323: %
1324: \begin{equation}
1325: \label{product23}
1326: R = \left(\begin{array}{c} R^{(0)} \\ R^{(1)} \\ \vdots \\ R^{(i-1)} \\ R^{(i)}
1327: \end{array}\right).
1328: \end{equation}
1329: %
1330: \item[2.] Using a pivoted $QR$-decomposition algorithm,
1331: form a real $n \times ((i+1)l)$ matrix $Q$ whose columns are orthonormal,
1332: such that there exists a real $((i+1)l) \times ((i+1)l)$ matrix $S$ for which
1333: %
1334: \begin{equation}
1335: \label{good_approx23}
1336: R^\T = Q \, S.
1337: \end{equation}
1338: %
1339: (See, for example, Chapter~5 in~\cite{golub-van_loan} for details concerning
1340: the construction of such a matrix $Q$.)
1341: %
1342: \item[3.] Compute the $m \times ((i+1)l)$ product matrix
1343: %
1344: \begin{equation}
1345: \label{product_t3}
1346: T = A \, Q.
1347: \end{equation}
1348: %
1349: \item[4.] Form an SVD of $T$,
1350: %
1351: \begin{equation}
1352: \label{svd_small3}
1353: T = U \, \Sigma \, W^\T,
1354: \end{equation}
1355: %
1356: where $U$ is a real $m \times ((i+1)l)$ matrix whose columns are orthonormal,
1357: $W$ is a real $((i+1)l) \times ((i+1)l)$ matrix whose columns are orthonormal,
1358: and $\Sigma$ is a real diagonal $((i+1)l) \times ((i+1)l)$ matrix
1359: whose entries are all nonnegative.
1360: (See, for example, Chapter~8 in~\cite{golub-van_loan} for details
1361: concerning the construction of such an SVD.)
1362: %
1363: \item[5.] Compute the $n \times ((i+1)l)$ product matrix
1364: %
1365: \begin{equation}
1366: \label{product33}
1367: V = Q \, W.
1368: \end{equation}
1369: %
1370: \end{enumerate}
1371: 
1372: An analysis similar to the proof of Theorem~\ref{the_theorem} above
1373: shows that the matrices $U$, $\Sigma$, and $V$ produced
1374: by the algorithm of the present subsection satisfy
1375: the same upper bounds~(\ref{the_point}) and~(\ref{explicit_eval})
1376: as the matrices produced by the algorithm of Subsection~\ref{main_algorithm}.
1377: If desired, one may produce a similarly accurate rank-$k$ approximation
1378: by arranging $U$, $\Sigma$, and $V$ such that the diagonal entries
1379: of $\Sigma$ appear in nonincreasing order,
1380: and then discarding all but the leftmost $k$ columns of $U$
1381: and all but the leftmost $k$ columns of $V$,
1382: and retaining only the leftmost uppermost $k \times k$ block of $\Sigma$.
1383: We will refer to the algorithm of the present subsection
1384: as ``blanczos,'' due to its similarity with the block Lanczos method
1385: (see, for example, Subsection~9.2.6 in~\cite{golub-van_loan}
1386: for a description of the block Lanczos method).
1387: 
1388: 
1389: 
1390: \section{Numerical results}
1391: \label{numerical}
1392: 
1393: In this section, we illustrate the performance of the algorithm
1394: of the present paper via several numerical examples.
1395: 
1396: We use the algorithm to construct a rank-$k$ approximation,
1397: with $k = 10$, to the $m \times (2m)$ matrix $A$ defined
1398: via its singular value decomposition
1399: %
1400: \begin{equation}
1401: \label{test_matrix}
1402: A = U^{(A)} \, \Sigma^{(A)} \, (V^{(A)})^\T,
1403: \end{equation}
1404: %
1405: where $U^{(A)}$ is an $m \times m$ Hadamard matrix
1406: (a unitary matrix whose entries are all $\pm 1/\sqrt{m}$),
1407: $V^{(A)}$ is a $(2m) \times (2m)$ Hadamard matrix,
1408: and $\Sigma^{(A)}$ is an $m \times (2m)$ matrix
1409: whose entries are zero off the main diagonal,
1410: and whose diagonal entries are defined
1411: in terms of the $(k+1)^\st$ singular value $\sigma_{k+1}$ via the formulae
1412: %
1413: \begin{equation}
1414: \Sigma^{(A)}_{j,j} = \sigma_j = (\sigma_{k+1})^{\lfloor j/2 \rfloor/5}
1415: \end{equation}
1416: %
1417: for $j = 1$,~$2$, \dots, $9$,~$10$,
1418: where $\lfloor j/2 \rfloor$ is the greatest integer less than
1419: or equal to $j/2$, and
1420: %
1421: \begin{equation}
1422: \Sigma^{(A)}_{j,j} = \sigma_j = \sigma_{k+1} \cdot \frac{m-j}{m-11}
1423: \end{equation}
1424: %
1425: for $j = 11$,~$12$, \dots, $m-1$,~$m$.
1426: Thus, $\sigma_1 = 1$ and $\sigma_k = \sigma_{k+1}$ (recall that $k = 10$).
1427: We always choose $\sigma_{k+1} < 1$,
1428: so that $\sigma_1 \ge \sigma_2 \ge \dots \ge \sigma_{m-1} \ge \sigma_m$.
1429: 
1430: Figure~1 plots the singular values
1431: $\sigma_1$,~$\sigma_2$, \dots, $\sigma_{m-1}$,~$\sigma_m$
1432: of $A$ with $m = 512$ and $\sigma_{k+1} = .001$;
1433: these parameters correspond to the first row of numbers in Table~1,
1434: the first row of numbers in Table~2, and the first row of numbers in Table~6.
1435: 
1436: Table~1 reports the results of applying the five-step algorithm
1437: of Subsection~\ref{main_algorithm} to matrices of various sizes, with $i = 1$.
1438: Table~2 reports the results of applying the five-step algorithm
1439: of Subsection~\ref{main_algorithm} to matrices of various sizes, with $i = 0$.
1440: The algorithms of~\cite{sarlos3}, \cite{sarlos4},
1441: and~\cite{liberty-woolfe-martinsson-rokhlin-tygert}
1442: for low-rank approximation are essentially the same as the algorithm used
1443: for Table~2 (with $i=0$).
1444: 
1445: Table~3 reports the results of applying the five-step algorithms
1446: of Subsections~\ref{main_algorithm} and~\ref{modified}
1447: with varying numbers of iterations $i$.
1448: Rows in the table where $i$ is enclosed in parentheses correspond
1449: to the algorithm of Subsection~\ref{modified};
1450: rows where $i$ is not enclosed in parentheses correspond
1451: to the algorithm of Subsection~\ref{main_algorithm}.
1452: 
1453: Table~4 reports the results of applying the five-step algorithm
1454: of Subsection~\ref{main_algorithm} to matrices
1455: whose best rank-$k$ approximations have varying accuracies.
1456: Table~5 reports the results of applying the blanczos algorithm
1457: of Subsection~\ref{blanczos} to matrices
1458: whose best rank-$k$ approximations have varying accuracies.
1459: 
1460: Table~6 reports the results of calculating pivoted $QR$-decompositions,
1461: via plane (Householder) reflections, of matrices of various sizes.
1462: We computed the pivoted $QR$-decomposition of the transpose of $A$ defined
1463: in~(\ref{test_matrix}), rather than of $A$ itself, for reasons of accuracy
1464: and efficiency. As pivoted $QR$-decomposition requires dense matrix arithmetic,
1465: our 1~GB of random-access memory (RAM) imposed the limit $m \le 4096$
1466: for Table~6.
1467: 
1468: The headings of the tables have the following meanings:
1469: %
1470: \begin{itemize}
1471: %
1472: \item $m$ is the number of rows in $A$, the matrix being approximated.
1473: %
1474: \item $n$ is the number of columns in $A$, the matrix being approximated.
1475: %
1476: \item $i$ is the integer parameter used in the algorithms
1477:       of Subsections~\ref{main_algorithm}, \ref{modified}, and~\ref{blanczos}.
1478:       Rows in the tables where $i$ is enclosed in parentheses correspond
1479:       to the algorithm of Subsection~\ref{modified};
1480:       rows where $i$ is not enclosed in parentheses correspond
1481:       to either the algorithm of Subsection~\ref{main_algorithm} or
1482:       that of Subsection~\ref{blanczos}.
1483: %
1484: \item $t$ is the time in seconds required by the algorithm to create
1485:       an approximation and compute its accuracy $\delta$.
1486: %
1487: \item $\sigma_{k+1}$ is the $(k+1)^\st$ greatest singular value of $A$,
1488:       the matrix being approximated; $\sigma_{k+1}$ is also the accuracy
1489:       of the best possible rank-$k$ approximation to $A$.
1490: %
1491: \item $\delta$ is the accuracy of the approximation $U \, \Sigma \, V^\T$
1492:       (or $(QRP)^\T$, for Table~6) constructed by the algorithm.
1493:       For Tables~1--5,
1494: %
1495: \begin{equation}
1496: \delta = \| A - U \, \Sigma \, V^\T \|,
1497: \end{equation}
1498: %
1499: where $U$ is an $m \times k$ matrix whose columns are orthonormal,
1500: $V$ is an $n \times k$ matrix whose columns are orthonormal,
1501: and $\Sigma$ is a diagonal $k \times k$ matrix whose entries
1502: are all nonnegative; for Table~6,
1503: %
1504: \begin{equation}
1505: \delta = \| A - (QRP)^\T \|,
1506: \end{equation}
1507: %
1508: where $P$ is an $m \times m$ permutation matrix,
1509: $R$ is a $k \times m$ upper-triangular (meaning upper-trapezoidal) matrix,
1510: and $Q$ is an $n \times k$ matrix whose columns are orthonormal.
1511: \end{itemize}
1512: 
1513: The values for $t$ are the average values over 3 independent randomized trials
1514: of the algorithm. The values for $\delta$ are the worst (maximum) values
1515: encountered in 3 independent randomized trials of the algorithm.
1516: The values for $\delta$ in each trial are those produced by 20 iterations
1517: of the power method applied to $A - U \, \Sigma \, V^\T$
1518: (or $A - (QRP)^\T$, for Table~6),
1519: started with a vector whose entries
1520: are i.i.d.\ centered Gaussian random variables.
1521: The theorems of~\cite{dixon} and~\cite{kuczynski-wozniakowski}
1522: guarantee that this power method produces accurate results
1523: with overwhelmingly high probability.
1524: 
1525: We performed all computations using IEEE standard double-precision variables,
1526: whose mantissas have approximately one bit of precision less than 16 digits
1527: (so that the relative precision of the variables is approximately .2E--15).
1528: We ran all computations on one core
1529: of a 1.86~GHz Intel Centrino Core Duo microprocessor
1530: with 2~MB of L2 cache and 1~GB of RAM.
1531: We compiled the Fortran~77 code
1532: using the Lahey/Fujitsu Linux Express v6.2 compiler,
1533: with the optimization flag {\tt {-}{-}o2} enabled.
1534: We implemented a fast Walsh-Hadamard transform
1535: to apply rapidly the Hadamard matrices $U^{(A)}$ and $V^{(A)}$
1536: in~(\ref{test_matrix}).
1537: We used plane (Householder) reflections
1538: to compute all pivoted $QR$-decompositions.
1539: We used the LAPACK 3.1.1 divide-and-conquer SVD routine {\tt dgesdd}
1540: to compute all full SVDs.
1541: For the parameter $l$, we set $l = 12$ $(= k+2)$
1542: for all of the examples reported here.
1543: 
1544: The experiments reported here and our further tests point
1545: to the following:
1546: 
1547: \begin{enumerate}
1548: %
1549: \item The accuracies in Table~1 are superior to those in Table~2;
1550: the algorithm performs much better with $i>0$.
1551: (The algorithms of~\cite{liberty-woolfe-martinsson-rokhlin-tygert},
1552: \cite{sarlos3}, and~\cite{sarlos4}
1553: for low-rank approximation are essentially the same as the algorithm used
1554: for Tables~1 and~2 when $i=0$.)
1555: %
1556: \item The accuracies in Table~1 are superior to the corresponding accuracies
1557: in Table~6; the algorithm of the present paper produces higher accuracy
1558: than the classical pivoted $QR$-decompositions for matrices whose spectra
1559: decay slowly (such as those matrices tested in the present section).
1560: %
1561: \item The accuracies in Tables~1--3 appear to be proportional
1562: to $m^{1/(4i+2)} \, \sigma_{k+1}$ for the algorithm
1563: of Subsection~\ref{main_algorithm},
1564: and to be proportional to $m^{1/(4i)} \, \sigma_{k+1}$ for the algorithm
1565: of Subsection~\ref{modified},
1566: in accordance with~(\ref{sort_of_svd}) and~(\ref{sort_of_svdmod}).
1567: The numerical results reported here, as well as our further experiments,
1568: indicate that the theoretical bound~(\ref{the_point}) on the accuracy
1569: should remain valid with a greatly reduced constant in the right-hand side,
1570: independent of the matrix $A$ being approximated.
1571: See item~6 below for a discussion of Tables~4 and~5.
1572: %
1573: \item The timings in Tables~1--5 are consistent with~(\ref{svd_costs}),
1574: as we could (and did) apply the Hadamard matrices $U^{(A)}$ and $V^{(A)}$
1575: in~(\ref{test_matrix}) to vectors via fast Walsh-Hadamard transforms
1576: at a cost of $\bigoh(m \, \log(m))$ floating-point operations
1577: per matrix-vector multiplication.
1578: %
1579: \item The quality of the pseudorandom number generator has almost no effect
1580: on the accuracy of the algorithm, nor does substituting uniform variates
1581: for the normal variates.
1582: %
1583: \item The accuracies in Table~5 are superior to those in Table~4,
1584: particularly when the $k^\th$ greatest singular value $\sigma_k$
1585: of the matrix $A$ being approximated is very small. Understandably,
1586: the algorithm of Subsection~\ref{main_algorithm} would seem to break down
1587: when $(\sigma_k)^{2i+1}$ is less than the machine precision,
1588: while $\sigma_k$ itself is not,
1589: unlike the blanczos algorithm of Subsection~\ref{blanczos}.
1590: When $(\sigma_k)^{2i+1}$ is much less than the machine precision,
1591: while $\sigma_k$ is not,
1592: the accuracy of blanczos in the presence of roundoff is similar to that
1593: of the algorithm of Subsection~\ref{main_algorithm} run with a reduced $i$.
1594: When $(\sigma_k)^{2i+1}$ is much greater than the machine precision,
1595: the accuracy of blanczos is similar to that of the algorithm
1596: of Subsection~\ref{main_algorithm} run with $i$ being the same as
1597: in the blanczos algorithm.
1598: Since the blanczos algorithm of Subsection~\ref{blanczos}
1599: is so tolerant of roundoff,
1600: we suspect that the blanczos algorithm is
1601: a better general-purpose black-box tool
1602: for the computation of principal component analyses,
1603: despite its somewhat higher cost as compared with the algorithms
1604: of Subsections~\ref{main_algorithm} and~\ref{modified}.
1605: %
1606: \end{enumerate}
1607: 
1608: 
1609: 
1610: \begin{remark}
1611: A MATLAB\registered\ implementation of the blanczos algorithm
1612: of Subsection~\ref{blanczos} is available on the file exchange at
1613: {\tt http://www.mathworks.com} in the package entitled,
1614: ``Principal Component Analysis.''
1615: \end{remark}
1616: 
1617: 
1618: 
1619: \section{Appendix}
1620: \label{appendix}
1621: 
1622: In this appendix, we restate and prove Lemmas~\ref{all_together2}
1623: and~\ref{probability_bounds2} from Section~\ref{apparatus}.
1624: 
1625: The following lemma, stated earlier as Lemma~\ref{all_together2}
1626: in Section~\ref{apparatus},
1627: shows that the product $A \, Q \, Q^\T$
1628: of matrices $A$, $Q$, and $Q^\T$
1629: is a good approximation to a matrix $A$,
1630: provided that there exist matrices $G$ and $S$ such that
1631: %
1632: \begin{enumerate}
1633: %
1634: \item[1.] the columns of $Q$ are orthonormal,
1635: %
1636: \item[2.] $Q \, S$ is a good approximation to $(G \, (A \, A^\T)^i \, A)^\T$,
1637: and
1638: %
1639: \item[3.] there exists a matrix $F$ such that $\| F \|$ is not too large,
1640: and $F \, G \, (A \, A^\T)^i \, A$ is a good approximation to $A$.
1641: %
1642: \end{enumerate}
1643: 
1644: \begin{lemma}
1645: \label{all_together22}
1646: Suppose that $i$, $k$, $l$, $m$, and~$n$ are positive integers
1647: with $k \le l \le m \le n$.
1648: Suppose further that $A$ is a real $m \times n$ matrix,
1649: $Q$ is a real $n \times k$ matrix whose columns are orthonormal,
1650: $S$ is a real $k \times l$ matrix,
1651: $F$ is a real $m \times l$ matrix,
1652: and $G$ is a real $l \times m$ matrix.
1653: 
1654: Then,
1655: %
1656: \begin{equation}
1657: \label{reconstruction22}
1658: \| A \, Q \, Q^\T - A \|
1659: \le 2 \, \| F \, G \, (A \, A^\T)^i \, A - A \|
1660:   + 2 \, \| F \| \, \| Q \, S - (G \, (A \, A^\T)^i \, A)^\T \|.
1661: \end{equation}
1662: %
1663: \end{lemma}
1664: 
1665: \begin{proof}
1666: The proof is straightforward, but tedious, as follows.
1667: 
1668: To simplify notation, we define
1669: %
1670: \begin{equation}
1671: \label{shorter}
1672: B = (A \, A^\T)^i \, A.
1673: \end{equation}
1674: 
1675: We obtain from the triangle inequality that
1676: %
1677: \begin{multline}
1678: \label{triangle}
1679: \| A \, Q \, Q^\T - A \|
1680: \le \| A \, Q \, Q^\T - F \, G \, B \, Q \, Q^\T \|
1681:   + \| F \, G \, B \, Q \, Q^\T - F \, G \, B \| \\
1682:   + \| F \, G \, B - A \|.
1683: \end{multline}
1684: 
1685: First, we provide a bound
1686: for $\| A \, Q \, Q^\T - F \, G \, B \, Q \, Q^\T \|$.
1687: Clearly,
1688: %
1689: \begin{equation}
1690: \label{bound0}
1691: \| A \, Q \, Q^\T - F \, G \, B \, Q \, Q^\T \|
1692: \le \| A - F \, G \, B \| \, \| Q \| \, \| Q^\T \|.
1693: \end{equation}
1694: %
1695: It follows from the fact that the columns of $Q$ are orthonormal that
1696: %
1697: \begin{equation}
1698: \label{bound1}
1699: \| Q \| \le 1
1700: \end{equation}
1701: %
1702: and
1703: %
1704: \begin{equation}
1705: \label{bound2}
1706: \| Q^\T \| \le 1.
1707: \end{equation}
1708: %
1709: Combining~(\ref{bound0}), (\ref{bound1}), and~(\ref{bound2}) yields
1710: %
1711: \begin{equation}
1712: \label{simpler}
1713: \| A \, Q \, Q^\T - F \, G \, B \, Q \, Q^\T \| \le \| A - F \, G \, B \|.
1714: \end{equation}
1715: 
1716: Next, we provide a bound
1717: for $\| F \, G \, B \, Q \, Q^\T - F \, G \, B \|$.
1718: Clearly,
1719: %
1720: \begin{equation}
1721: \label{triangle4}
1722: \| F \, G \, B \, Q \, Q^\T - F \, G \, B \|
1723: \le \| F \| \, \| G \, B \, Q \, Q^\T - G \, B \|.
1724: \end{equation}
1725: %
1726: It follows from the triangle inequality that
1727: %
1728: \begin{multline}
1729: \label{triangle3}
1730: \| G \, B \, Q \, Q^\T - G \, B \|
1731: \le \| G \, B \, Q \, Q^\T - S^\T \, Q^\T \, Q \, Q^\T \| \\
1732:   + \| S^\T \, Q^\T \, Q \, Q^\T - S^\T \, Q^\T \|
1733:   + \| S^\T \, Q^\T - G \, B \|.
1734: \end{multline}
1735: 
1736: Furthermore,
1737: %
1738: \begin{equation}
1739: \label{prev}
1740: \| G \, B \, Q \, Q^\T - S^\T \, Q^\T \, Q \, Q^\T \|
1741: \le \| G \, B - S^\T \, Q^\T \| \, \| Q \| \, \| Q^\T \|.
1742: \end{equation}
1743: %
1744: Combining~(\ref{prev}), (\ref{bound1}), and~(\ref{bound2}) yields
1745: %
1746: \begin{equation}
1747: \label{bound4}
1748: \| G \, B \, Q \, Q^\T - S^\T \, Q^\T \, Q \, Q^\T \|
1749: \le \| G \, B - S^\T \, Q^\T \|.
1750: \end{equation}
1751: 
1752: Also, it follows from the fact that the columns of $Q$ are orthonormal that
1753: %
1754: \begin{equation}
1755: \label{orthonormal}
1756: Q^\T \, Q = \Id.
1757: \end{equation}
1758: %
1759: It follows from~(\ref{orthonormal}) that
1760: %
1761: \begin{equation}
1762: \label{vanish}
1763: \| S^\T \, Q^\T \, Q \, Q^\T - S^\T \, Q^\T \| = 0.
1764: \end{equation}
1765: %
1766: 
1767: Combining~(\ref{triangle3}), (\ref{bound4}), and~(\ref{vanish}) yields
1768: %
1769: \begin{equation}
1770: \label{triangle5}
1771: \| G \, B \, Q \, Q^\T - G \, B \| \le 2 \, \| S^\T \, Q^\T - G \, B \|.
1772: \end{equation}
1773: %
1774: Combining~(\ref{triangle4}) and~(\ref{triangle5}) yields
1775: %
1776: \begin{equation}
1777: \label{triangle6}
1778: \| F \, G \, B \, Q \, Q^\T - F \, G \, B \|
1779: \le 2 \, \| F \| \, \| S^\T \, Q^\T - G \, B \|.
1780: \end{equation}
1781: %
1782: 
1783: Combining~(\ref{triangle}), (\ref{simpler}), (\ref{triangle6}),
1784: and~(\ref{shorter}) yields~(\ref{reconstruction22}).
1785: \end{proof}
1786: 
1787: 
1788: The following lemma, stated earlier as Lemma~\ref{probability_bounds2}
1789: in Section~\ref{apparatus}, shows that,
1790: for any positive integer $i$, matrix $A$, and matrix $G$ whose entries are
1791: i.i.d.\ Gaussian random variables of zero mean and unit variance,
1792: with very high probability there exists a matrix $F$
1793: with a reasonably small norm,
1794: such that $F \, G \, (A \, A^\T)^i \, A$ is a good approximation to $A$.
1795: This lemma is similar to Lemma~19 of~\cite{martinsson-rokhlin-tygert3}.
1796: 
1797: \begin{lemma}
1798: \label{probability_bounds22}
1799: Suppose that $i$, $j$, $k$, $l$, $m$, and~$n$ are positive integers
1800: with $j < k < l < m \le n$.
1801: Suppose further that $A$ is a real $m \times n$ matrix,
1802: $G$ is a real $l \times m$ matrix whose entries are
1803: i.i.d.\ Gaussian random variables of zero mean and unit variance,
1804: and $\beta$ and $\gamma$ are positive real numbers, such that
1805: the $j^\ith$ greatest singular value $\sigma_j$ of $A$ is positive,
1806: $\gamma > 1$, and
1807: %
1808: \begin{multline}
1809: \label{probability22}
1810: \Phi
1811:   = 1 - \frac{1}{\sqrt{2 \pi \, (l-j+1)}}
1812:  \, \left( \frac{e}{(l-j+1) \, \beta} \right)^{l-j+1} \\
1813:   - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, \max(m-k,l) \; \gamma^2}}
1814:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^{\max(m-k,\,l)} \\
1815:   - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, l \, \gamma^2}}
1816:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^l
1817: \end{multline}
1818: %
1819: is nonnegative.
1820: 
1821: Then, there exists a real $m \times l$ matrix $F$ such that
1822: %
1823: \begin{multline}
1824: \label{approximation22}
1825: \| F \, G \, (A \, A^\T)^i \, A - A \|
1826: \le \sqrt{ 2 l^2 \, \beta^2 \, \gamma^2 + 1 }
1827:  \;\; \sigma_{j+1} \\
1828:   + \sqrt{ 2 l \, \max(m-k,l) \, \beta^2 \, \gamma^2
1829:         \, \left( \frac{\sigma_{k+1}}{\sigma_j} \right)^{4i} + 1 }
1830:  \;\; \sigma_{k+1}
1831: \end{multline}
1832: %
1833: and
1834: %
1835: \begin{equation}
1836: \label{small_norm22}
1837: \| F \| \le \frac{\sqrt{l} \; \beta}{(\sigma_j)^{2i}}
1838: \end{equation}
1839: %
1840: with probability not less than $\Phi$ defined in~(\ref{probability22}),
1841: where $\sigma_j$ is the $j^\ith$ greatest singular value of $A$,
1842: $\sigma_{j+1}$ is the $(j+1)^\ist$ greatest singular value of $A$,
1843: and $\sigma_{k+1}$ is the $(k+1)^\ist$ greatest singular value of $A$.
1844: \end{lemma}
1845: 
1846: \begin{proof}
1847: We prove the existence of a matrix $F$ satisfying~(\ref{approximation22})
1848: and~(\ref{small_norm22}) by constructing one.
1849: 
1850: We start by forming an SVD of $A$,
1851: %
1852: \begin{equation}
1853: \label{svd2}
1854: A = U \, \Sigma \, V^\T,
1855: \end{equation}
1856: %
1857: where $U$ is a real unitary $m \times m$ matrix,
1858: $\Sigma$ is a real diagonal $m \times m$ matrix,
1859: and $V$ is a real $n \times m$ matrix whose columns are orthonormal, such that
1860: %
1861: \begin{equation}
1862: \label{ordering2}
1863: \Sigma_{p,p} = \sigma_p
1864: \end{equation}
1865: %
1866: for $p = 1$,~$2$, \dots, $m-1$,~$m$,
1867: where $\Sigma_{p,p}$ is the entry in row $p$ and column $p$ of $\Sigma$,
1868: and $\sigma_p$ is the $p^\th$ greatest singular value of $A$.
1869: 
1870: Next, we define auxiliary matrices
1871: $H$, $R$, $\Gamma$, $S$, $T$, $\Theta$, and $P$.
1872: We define $H$ to be the leftmost $l \times j$ block
1873: of the $l \times m$ matrix $G \, U$,
1874: $R$ to be the $l \times (k-j)$ block of $G \, U$
1875: whose first column is the $(k+1)^\st$ column of $G \, U$,
1876: and $\Gamma$ to be the rightmost $l \times (m-k)$ block
1877: of $G \, U$, so that
1878: %
1879: \begin{equation}
1880: \label{partition2}
1881: G \, U = \left( \begin{array}{c|c|c} H & R & \Gamma \end{array} \right).
1882: \end{equation}
1883: %
1884: Combining the fact that $U$ is real and unitary,
1885: and the fact that the entries of $G$ are i.i.d.\ Gaussian
1886: random variables of zero mean and unit variance,
1887: we see that the entries of $H$ are also i.i.d.\ Gaussian
1888: random variables of zero mean and unit variance,
1889: as are the entries of $R$, and as are the entries of $\Gamma$.
1890: We define $H^{(-1)}$ to be the real $j \times l$ matrix
1891: given by the formula
1892: %
1893: \begin{equation}
1894: \label{definition_of_pseudoinverse2}
1895: H^{(-1)} = (H^\T \, H)^{-1} \, H^\T
1896: \end{equation}
1897: %
1898: ($H^\T \, H$ is invertible with high probability
1899: due to Lemma~\ref{least_value}).
1900: %
1901: We define $S$ to be the leftmost uppermost $j \times j$ block of $\Sigma$,
1902: $T$ to be the $(k-j) \times (k-j)$ block of $\Sigma$
1903: whose leftmost uppermost entry is the entry
1904: in the $(j+1)^\st$ row and $(j+1)^\st$ column of $\Sigma$,
1905: and $\Theta$ to be the rightmost lowermost $(m-k) \times (m-k)$ block
1906: of $\Sigma$, so that
1907: %
1908: \begin{equation}
1909: \label{svd_partition2}
1910: \Sigma
1911: = \left( \begin{array}{c|c|c} S   & \0s & \0s    \\\hline
1912:                               \0s & T   & \0s    \\\hline
1913:                               \0s & \0s & \Theta
1914:          \end{array} \right).
1915: \end{equation}
1916: %
1917: We define $P$ to be the real $m \times l$ matrix
1918: whose uppermost $j \times l$ block is the product $S^{-2i} \, H^{(-1)}$,
1919: whose entries are zero in the $(k-j) \times l$ block whose first row
1920: is the $(j+1)^\st$ row of $P$,
1921: and whose entries in the lowermost $(m-k) \times l$ block are zero,
1922: so that
1923: %
1924: \begin{equation}
1925: \label{pad2}
1926: P = \left( \begin{array}{c} S^{-2i} \, H^{(-1)} \\\hline \0s
1927:                                                 \\\hline \0s
1928:            \end{array} \right).
1929: \end{equation}
1930: 
1931: Finally, we define $F$ to be the $m \times l$ matrix given by
1932: %
1933: \begin{equation}
1934: \label{inverter2}
1935: F = U \, P = U \, \left( \begin{array}{c} S^{-2i} \, H^{(-1)} \\\hline
1936:                                           \0s \\\hline \0s
1937:                          \end{array} \right).
1938: \end{equation}
1939: 
1940: Combining~(\ref{definition_of_pseudoinverse2}), (\ref{pseudoinverse_norm}), 
1941: the fact that the entries of $H$ are i.i.d.\ Gaussian
1942: random variables of zero mean and unit variance,
1943: and Lemma~\ref{least_value} yields
1944: %
1945: \begin{equation}
1946: \label{pseudoinverse2}
1947: \left\| H^{(-1)} \right\| \le \sqrt{l} \; \beta
1948: \end{equation}
1949: %
1950: with probability not less than
1951: %
1952: \begin{equation}
1953: 1 - \frac{1}{\sqrt{2 \pi \, (l-j+1)}}
1954:  \, \left( \frac{e}{(l-j+1) \, \beta} \right)^{l-j+1}.
1955: \end{equation}
1956: %
1957: Combining~(\ref{inverter2}), (\ref{pseudoinverse2}), (\ref{svd_partition2}),
1958: (\ref{ordering2}), the fact that $\Sigma$ is zero off its main diagonal,
1959: and the fact that $U$ is unitary yields~(\ref{small_norm22}).
1960: 
1961: We now show that $F$ defined in~(\ref{inverter2})
1962: satisfies~(\ref{approximation22}).
1963: 
1964: Combining~(\ref{svd2}), (\ref{partition2}), and~(\ref{inverter2}) yields
1965: %
1966: \begin{equation}
1967: \label{simplification12}
1968: F \, G \, (A \, A^\T)^i \, A - A
1969: = U \, \left( \left( \begin{array}{c} S^{-2i} \, H^{(-1)} \\\hline
1970:                                       \0s \\\hline \0s
1971:                      \end{array} \right)
1972:               \left( \begin{array}{c|c|c} H & R & \Gamma \end{array} \right)
1973:               \, \Sigma^{2i}
1974:             - \Id \right) \, \Sigma \, V^\T.
1975: \end{equation}
1976: %
1977: Combining~(\ref{definition_of_pseudoinverse2})
1978: and~(\ref{svd_partition2}) yields
1979: %
1980: \begin{multline}
1981: \label{simplification22}
1982: \left( \left( \begin{array}{c} S^{-2i} \, H^{(-1)} \\\hline \0s
1983:                                                    \\\hline \0s
1984:               \end{array} \right)
1985:        \left( \begin{array}{c|c|c} H & R & \Gamma \end{array} \right)
1986:        \, \Sigma^{2i}
1987:      - \Id \right) \, \Sigma \\
1988: = \left( \begin{array}{c|c|c}
1989:          \0s & S^{-2i} \, H^{(-1)} \, R \; T^{2i+1} &
1990:                S^{-2i} \, H^{(-1)} \, \Gamma \, \Theta^{2i+1} \\\hline
1991:          \0s & -T & \0s \\\hline
1992:          \0s & \0s & -\Theta
1993:   \end{array} \right).
1994: \end{multline}
1995: %
1996: Furthermore,
1997: %
1998: \begin{multline}
1999: \label{Frobenius2}
2000: \left\| \left( \begin{array}{c|c|c}
2001:        \0s & S^{-2i} \, H^{(-1)} \, R \; T^{2i+1} &
2002:              S^{-2i} \, H^{(-1)} \, \Gamma \, \Theta^{2i+1} \\\hline
2003:        \0s & -T & \0s \\\hline
2004:        \0s & \0s & -\Theta
2005: \end{array} \right) \right\|^2 \\
2006: \le \left\| S^{-2i} \, H^{(-1)} \, R \, T^{2i+1} \right\|^2
2007:   + \left\| S^{-2i} \, H^{(-1)} \, \Gamma \, \Theta^{2i+1} \right\|^2
2008:   + \| T \|^2 + \| \Theta \|^2.
2009: \end{multline}
2010: 
2011: Moreover,
2012: %
2013: \begin{equation}
2014: \label{product_of_norms2}
2015: \left\| S^{-2i} \, H^{(-1)} \, R \, T^{2i+1} \right\|
2016: \le \left\| S^{-1} \right\|^{2i} \, \left\| H^{(-1)} \right\|
2017:  \, \| R \| \, \| T \|^{2i+1}
2018: \end{equation}
2019: %
2020: and
2021: %
2022: \begin{equation}
2023: \label{product_of_norms3}
2024: \left\| S^{-2i} \, H^{(-1)} \, \Gamma \, \Theta^{2i+1} \right\|
2025: \le \left\| S^{-1} \right\|^{2i} \, \left\| H^{(-1)} \right\|
2026:  \, \| \Gamma \| \, \| \Theta \|^{2i+1}.
2027: \end{equation}
2028: %
2029: Combining~(\ref{svd_partition2}) and~(\ref{ordering2}) yields
2030: %
2031: \begin{equation}
2032: \label{singular_value_bound1}
2033: \left\| S^{-1} \right\| \le \frac{1}{\sigma_j},
2034: \end{equation}
2035: %
2036: \begin{equation}
2037: \label{singular_value_bound2}
2038: \| T \| \le \sigma_{j+1},
2039: \end{equation}
2040: %
2041: and
2042: %
2043: \begin{equation}
2044: \label{singular_value_bound3}
2045: \| \Theta \| \le \sigma_{k+1}.
2046: \end{equation}
2047: %
2048: Combining~(\ref{simplification12})--(\ref{singular_value_bound3})
2049: and the fact that the columns of $U$ are orthonormal,
2050: as are the columns of $V$, yields
2051: %
2052: \begin{multline}
2053: \label{almost_there2}
2054: \| F \, G \, (A \, A^\T)^i \, A - A \|^2
2055: \le \left( \left\| H^{(-1)} \right\|^2 \, \| R \|^2
2056:         \, \left( \frac{\sigma_{j+1}}{\sigma_j} \right)^{4i} + 1 \right)
2057:  \, (\sigma_{j+1})^2 \\
2058:   + \left( \left\| H^{(-1)} \right\|^2 \, \| \Gamma \|^2
2059:         \, \left( \frac{\sigma_{k+1}}{\sigma_j} \right)^{4i} + 1 \right)
2060:  \, (\sigma_{k+1})^2.
2061: \end{multline}
2062: 
2063: Combining Lemma~\ref{greatest_value}
2064: and the fact that the entries of $R$ are
2065: i.i.d.\ Gaussian random variables of zero mean and unit variance,
2066: as are the entries of $\Gamma$, yields
2067: %
2068: \begin{equation}
2069: \label{residual2}
2070: \| R \| \le \sqrt{2l} \; \gamma
2071: \end{equation}
2072: %
2073: and
2074: %
2075: \begin{equation}
2076: \label{residual3}
2077: \| \Gamma \| \le \sqrt{2 \, \max(m-k,l)} \; \gamma,
2078: \end{equation}
2079: %
2080: with probability not less than
2081: %
2082: \begin{multline}
2083: 1 - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, \max(m-k,l) \, \gamma^2}}
2084:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^{\max(m-k,\,l)} \\
2085:   - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, l \, \gamma^2}}
2086:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^l.
2087: \end{multline}
2088: %
2089: Combining~(\ref{almost_there2}), (\ref{pseudoinverse2}),
2090: (\ref{residual2}), and~(\ref{residual3}) yields
2091: %
2092: \begin{multline}
2093: \label{pre-approximation2}
2094: \| F \, G \, (A \, A^\T)^i \, A - A \|^2
2095: \le \left( 2 l^2 \, \beta^2 \, \gamma^2
2096:         \, \left( \frac{\sigma_{j+1}}{\sigma_j} \right)^{4i} + 1 \right)
2097:  \, (\sigma_{j+1})^2 \\
2098:   + \left( 2 l \, \max(m-k,l) \, \beta^2 \, \gamma^2
2099:         \, \left( \frac{\sigma_{k+1}}{\sigma_j} \right)^{4i} + 1 \right)
2100:  \, (\sigma_{k+1})^2
2101: \end{multline}
2102: %
2103: with probability not less than $\Phi$ defined in~(\ref{probability22}).
2104: Combining~(\ref{pre-approximation2}),
2105: the fact that $\sigma_{j+1} \le \sigma_j$, and the fact that
2106: %
2107: \begin{equation}
2108: \sqrt{x + y} \le \sqrt{x} + \sqrt{y}
2109: \end{equation}
2110: %
2111: for any nonnegative real numbers $x$ and $y$
2112: yields~(\ref{approximation22}). 
2113: \end{proof}
2114: 
2115: 
2116: 
2117: \section*{Acknowledgements}
2118: We thank Ming Gu for suggesting the combination
2119: of the Lanczos method with randomized methods
2120: for the low-rank approximation of matrices.
2121: We are grateful for many helpful discussions
2122: with R. Raphael Coifman and Yoel Shkolnisky.
2123: We thank the anonymous referees for their useful suggestions.
2124: 
2125: 
2126: 
2127: \begin{figure}[b]
2128: \begin{center}
2129: %
2130: %
2131: \begin{tabular}{r|r|c|r|r|r}
2132:    $m$ &     $n$ & $i$ &      $t$ & $\sigma_{k+1}$ & $\delta$ \\\hline
2133:                                                                 \hline
2134:    512 &    1024 &   1 & .13E--01 &           .001 &    .0011 \\\hline
2135:   2048 &    4096 &   1 & .56E--01 &           .001 &    .0013 \\\hline
2136:   8192 &   16384 &   1 & .25E--00 &           .001 &    .0018 \\\hline
2137:  32768 &   65536 &   1 &  .12E+01 &           .001 &    .0024 \\\hline
2138: 131072 &  262144 &   1 &  .75E+01 &           .001 &    .0037 \\\hline
2139: 524288 & 1048576 &   1 &  .36E+02 &           .001 &    .0039 \\\hline
2140: \end{tabular}
2141: %
2142: %
2143: \\\vspace{.125in}
2144: %
2145: Table~1: Five-step algorithm of Subsection~\ref{main_algorithm}
2146: %
2147: %
2148: \end{center}
2149: \end{figure}
2150: 
2151: 
2152: 
2153: \begin{figure}
2154: \begin{center}
2155: %
2156: %
2157: \begin{tabular}{r|r|c|r|r|r}
2158:     $m$ &    $n$ & $i$ &      $t$ & $\sigma_{k+1}$ & $\delta$ \\\hline
2159:                                                                 \hline
2160:    512 &    1024 &   0 & .14E--01 &           .001 &     .012 \\\hline
2161:   2048 &    4096 &   0 & .47E--01 &           .001 &     .027 \\\hline
2162:   8192 &   16384 &   0 & .22E--00 &           .001 &     .039 \\\hline
2163:  32768 &   65536 &   0 &  .10E+01 &           .001 &     .053 \\\hline
2164: 131072 &  262144 &   0 &  .60E+01 &           .001 &     .110 \\\hline
2165: 524288 & 1048576 &   0 &  .29E+02 &           .001 &     .220 \\\hline
2166: \end{tabular}
2167: %
2168: %
2169: \\\vspace{.125in}
2170: %
2171: Table~2: Five-step algorithm of Subsection~\ref{main_algorithm}
2172: %
2173: %
2174: \end{center}
2175: \end{figure}
2176: 
2177: 
2178: 
2179: \begin{figure}
2180: \begin{center}
2181: %
2182: %
2183: \begin{tabular}{r|r|c|r|r|r}
2184:    $m$ &     $n$ & $i$ &     $t$ & $\sigma_{k+1}$ & $\delta$ \\\hline
2185:                                                                \hline
2186: 524288 & 1048576 &   0 & .29E+02 &            .01 &     .862 \\\hline
2187: 524288 & 1048576 & (1) & .31E+02 &            .01 &     .091 \\\hline
2188: 524288 & 1048576 &   1 & .36E+02 &            .01 &     .037 \\\hline
2189: 524288 & 1048576 & (2) & .38E+02 &            .01 &     .025 \\\hline
2190: 524288 & 1048576 &   2 & .43E+02 &            .01 &     .022 \\\hline
2191: 524288 & 1048576 & (3) & .45E+02 &            .01 &     .015 \\\hline
2192: 524288 & 1048576 &   3 & .49E+02 &            .01 &     .010 \\\hline
2193: \end{tabular}
2194: %
2195: %
2196: \\\vspace{.125in}
2197: %
2198: Table~3: Five-step algorithms of Subsections~\ref{main_algorithm}
2199:          and~\ref{modified} \\\quad\quad\quad\quad
2200:          (parentheses around $i$ designate Subsection~\ref{modified})
2201: %
2202: %
2203: \end{center}
2204: \end{figure}
2205: 
2206: 
2207: 
2208: \begin{figure}
2209: \begin{center}
2210: %
2211: %
2212: \begin{tabular}{r|r|c|r|r|r}
2213:    $m$ &    $n$ & $i$ &     $t$ & $\sigma_{k+1}$ & $\delta$ \\\hline
2214:                                                               \hline
2215: 262144 & 524288 &   1 & .17E+02 &       .10E--02 & .39E--02 \\\hline
2216: 262144 & 524288 &   1 & .17E+02 &       .10E--04 & .10E--03 \\\hline
2217: 262144 & 524288 &   1 & .17E+02 &       .10E--06 & .25E--05 \\\hline
2218: 262144 & 524288 &   1 & .17E+02 &       .10E--08 & .90E--06 \\\hline
2219: 262144 & 524288 &   1 & .17E+02 &       .10E--10 & .55E--07 \\\hline
2220: 262144 & 524288 &   1 & .17E+02 &       .10E--12 & .51E--08 \\\hline
2221: 262144 & 524288 &   1 & .17E+02 &       .10E--14 & .10E--05 \\\hline
2222: \end{tabular}
2223: %
2224: %
2225: \\\vspace{.125in}
2226: %
2227: Table~4: Five-step algorithm of Subsection~\ref{main_algorithm}
2228: %
2229: %
2230: \end{center}
2231: \end{figure}
2232: 
2233: 
2234: 
2235: \begin{figure}
2236: \begin{center}
2237: %
2238: %
2239: \begin{tabular}{r|r|c|r|r|r}
2240:    $m$ &    $n$ & $i$ &     $t$ & $\sigma_{k+1}$ &   $\delta$ \\\hline
2241:                                                                 \hline
2242: 262144 & 524288 &   1 & .31E+02 &       .10E--02 &   .35E--02 \\\hline
2243: 262144 & 524288 &   1 & .31E+02 &       .10E--04 &   .15E--04 \\\hline
2244: 262144 & 524288 &   1 & .31E+02 &       .10E--06 &   .24E--05 \\\hline
2245: 262144 & 524288 &   1 & .31E+02 &       .10E--08 &   .11E--06 \\\hline
2246: 262144 & 524288 &   1 & .31E+02 &       .10E--10 &   .19E--08 \\\hline
2247: 262144 & 524288 &   1 & .31E+02 &       .10E--12 &   .25E--10 \\\hline
2248: 262144 & 524288 &   1 & .31E+02 &       .10E--14 &   .53E--11 \\\hline
2249: \end{tabular}
2250: %
2251: %
2252: \\\vspace{.125in}
2253: %
2254: Table~5: Five-step algorithm of Subsection~\ref{blanczos}
2255: %
2256: %
2257: \end{center}
2258: \end{figure}
2259: 
2260: 
2261: 
2262: \begin{figure}
2263: \begin{center}
2264: %
2265: %
2266: \begin{tabular}{r|r|r|r|r}
2267:  $m$ &  $n$ &      $t$ & $\sigma_{k+1}$ & $\delta$ \\\hline
2268:                                                      \hline
2269:  512 & 1024 & .60E--01 &           .001 &    .0047 \\\hline
2270: 1024 & 2048 & .29E--00 &           .001 &    .0065 \\\hline
2271: 2048 & 4096 &  .11E+01 &           .001 &    .0092 \\\hline
2272: 4096 & 8192 &  .43E+01 &           .001 &    .0131 \\\hline
2273: \end{tabular}
2274: %
2275: %
2276: \\\vspace{.125in}
2277: %
2278: Table~6: Pivoted $QR$-decomposition
2279: %
2280: %
2281: \end{center}
2282: \end{figure}
2283: 
2284: 
2285: 
2286: \begin{figure}
2287: \begin{center}
2288: %
2289: %
2290: \rotatebox{-90}{\scalebox{.28}{\includegraphics{plot}}}
2291: %
2292: %
2293: \\\vspace{.15in}
2294: %
2295: Figure~1: Singular values with $m = 512$, $n = 1024$, \\
2296:           and $\sigma_{k+1} = .001$
2297: %
2298: %
2299: \end{center}
2300: \end{figure}
2301: 
2302: 
2303: 
2304: \clearpage
2305: 
2306: 
2307: \bibliographystyle{siam}
2308: \bibliography{pca}
2309: 
2310: 
2311: \end{document}
2312: