0809:0809.2274/pca.tex

1: \documentclass[final]{siamltex}

2: \usepackage{amsmath}

3: \usepackage{graphicx}

4:

5: \def\T{{\hbox{\scriptsize{\rm T}}}}

6: \def\tinyT{{\hbox{\tiny{\rm T}}}}

7: \def\epsilon{\varepsilon}

8: \def\phi{\varphi}

9: \def\bigoh{\mathcal{O}}

10: \def\th{{\rm th}}

11: \def\ith{{\it th}}

12: \def\rd{{\rm rd}}

13: \def\ird{{\it rd}}

14: \def\nd{{\rm nd}}

15: \def\ind{{\it nd}}

16: \def\st{{\rm st}}

17: \def\ist{{\it st}}

18: \def\Id{{\bf 1}}

19: \def\0s{{\bf 0}}

20:

21: \def\registered{$^{\hbox{\ooalign{\hfil\raise .20ex\hbox{\textbf{\tiny R}}\hfil\crcr\mathhexbox20C}}}$}

22:

23: \newtheorem{observe}[theorem]{Observation}

24: \newtheorem{remark1}[theorem]{Remark}

25:

26: \newenvironment{observation}{\begin{observe} \rm}{\end{observe}}

27: \newenvironment{remark}{\begin{remark1} \rm}{\end{remark1}}

28:

29:

30: \title{A randomized algorithm for\\principal component analysis}

31: \author{Vladimir Rokhlin\thanks{Departments of Computer Science, Mathematics,

32: and Physics, Yale University, New Haven, CT 06511;

33: supported in part by DARPA/AFOSR Grant FA9550-07-1-0541.} \and

34: Arthur Szlam\thanks{Department of Mathematics, UCLA, Los Angeles, CA 90095-1555;

35: supported in part by NSF Grant DMS-0811203 ({\tt aszlam@math.ucla.edu}).} \and

36: Mark Tygert\thanks{Department of Mathematics, UCLA, Los Angeles, CA 90095-1555

37: ({\tt tygert@aya.yale.edu}).}

38: }

39:

40:

41: \begin{document}

42:

43: \maketitle

44:

45: \begin{abstract}

46: Principal component analysis (PCA) requires the computation

47: of a low-rank approximation to a matrix containing the data being analyzed.

48: In many applications of PCA, the best possible accuracy

49: of any rank-deficient approximation is at most a few digits

50: (measured in the spectral norm,

51: relative to the spectral norm of the matrix being approximated).

52: In such circumstances, efficient algorithms have not come

53: with guarantees of good accuracy,

54: unless one or both dimensions of the matrix being approximated are small.

55: We describe an efficient algorithm for the low-rank approximation of matrices

56: that produces accuracy very close to the best possible,

57: for matrices of arbitrary sizes.

58: We illustrate our theoretical results via several numerical examples.

59: \end{abstract}

60:

61: \begin{keywords}

62: PCA, singular value decomposition, SVD, low rank, Lanczos, power

63: \end{keywords}

64:

65: \begin{AMS}

66: 65F15, 65C60, 68W20

67: \end{AMS}

68:

69:

70: \pagestyle{myheadings}

71: \thispagestyle{plain}

72: \markboth{ROKHLIN, SZLAM, AND TYGERT}{A RANDOMIZED ALGORITHM FOR PCA}

73:

74:

75:

76: \section{Introduction}

77:

78: Principal component analysis\,(PCA)\,is among the most\,widely used techniques

79: in statistics, data analysis, and data mining.

80: PCA is the basis of many machine learning methods,

81: including the latent semantic analysis

82: of large databases of text and HTML documents described

83: in~\cite{deerwester-dumais-furnas-landauer-harshman}.

84: Computationally, PCA amounts to the low-rank approximation of a matrix

85: containing the data being analyzed.

86: The present article describes an algorithm

87: for the low-rank approximation of matrices, suitable for PCA.

88: This paper demonstrates both theoretically and via numerical examples

89: that the algorithm efficiently produces low-rank approximations

90: whose accuracies are very close to the best possible.

91:

92: The canonical construction of the best possible rank-$k$ approximation

93: to a real $m \times n$ matrix $A$ uses the singular value decomposition (SVD)

94: of $A$,

95: %

96: \begin{equation}

97: \label{full_svd}

98: A = U \, \Sigma \, V^\T,

99: \end{equation}

100: %

101: where $U$ is a real unitary $m \times m$ matrix,

102: $V$ is a real unitary $n \times n$ matrix,

103: and $\Sigma$ is a real $m \times n$ matrix whose only nonzero entries

104: appear in nonincreasing order on the diagonal and are nonnegative.

105: The diagonal entries $\sigma_1$,~$\sigma_2$,

106: \dots, $\sigma_{\min(m,n)-1}$,~$\sigma_{\min(m,n)}$

107: of $\Sigma$ are known as the singular values of $A$.

108: The best rank-$k$ approximation to $A$, with $k < m$ and $k < n$, is

109: %

110: \begin{equation}

111: \label{low_rank_approx}

112: A \approx \tilde{U} \, \tilde{\Sigma} \, \tilde{V}^\T,

113: \end{equation}

114: %

115: where $\tilde{U}$ is the leftmost $m \times k$ block of $U$,

116: $\tilde{V}$ is the leftmost $n \times k$ block of $V$,

117: and $\tilde{\Sigma}$ is the $k \times k$ matrix

118: whose only nonzero entries appear in nonincreasing order on the diagonal

119: and are the $k$ greatest singular values of $A$.

120: This approximation is ``best'' in the sense that

121: the spectral norm $\| A - B \|$ of the difference between $A$

122: and a rank-$k$ matrix $B$ is minimal

123: for $B = \tilde{U} \, \tilde{\Sigma} \, \tilde{V}^\T$.

124: In fact,

125: %

126: \begin{equation}

127: \| A - \tilde{U} \, \tilde{\Sigma} \, \tilde{V}^\T \| = \sigma_{k+1},

128: \end{equation}

129: %

130: where $\sigma_{k+1}$ is the $(k+1)^\st$ greatest singular value of $A$.

131: For more information about the SVD, see, for example,

132: Chapter~8 in~\cite{golub-van_loan}.

133:

134: For definiteness, let us assume that $m \le n$

135: and that $A$ is an arbitrary (dense) real $m \times n$ matrix.

136: To compute a rank-$k$ approximation to $A$,

137: one might form the matrices $U$, $\Sigma$, and $V$ in~(\ref{full_svd}),

138: and then use them to construct $\tilde{U}$, $\tilde{\Sigma}$, and $\tilde{V}$

139: in~(\ref{low_rank_approx}).

140: However, even computing just $\Sigma$, the leftmost $m$ columns of $U$,

141: and the leftmost $m$ columns of $V$ requires at least

142: $\bigoh(n m^2)$ floating-point operations (flops) using any

143: of the standard algorithms

144: (see, for example, Chapter~8 in~\cite{golub-van_loan}).

145: Alternatively, one might use pivoted $QR$-decomposition algorithms,

146: which require $\bigoh(nmk)$ flops

147: and typically produce a rank-$k$ approximation $B$ to $A$ such that

148: %

149: \begin{equation}

150: \label{gu_bound}

151: \| A - B \| \le 10 \sqrt{m} \; \sigma_{k+1},

152: \end{equation}

153: %

154: where $\|A-B\|$ is the spectral norm of $A-B$,

155: and $\sigma_{k+1}$ is the $(k+1)^\st$ greatest singular value of $A$

156: (see, for example, Chapter~5 in~\cite{golub-van_loan}).

157: Furthermore, the algorithms of~\cite{gu-eisenstat} require only

158: about $\bigoh(nmk)$ flops to produce a rank-$k$ approximation that

159: (unlike an approximation produced by a pivoted $QR$-decomposition)

160: has been guaranteed to satisfy a bound nearly as strong as~(\ref{gu_bound}).

161:

162: While the accuracy in~(\ref{gu_bound}) is sufficient

163: for many applications of low-rank approximation,

164: PCA often involves $m \ge$ 10,000,

165: and a ``signal-to-noise ratio'' $\sigma_1/\sigma_{k+1} \le 100$,

166: where $\sigma_1 = \|A\|$ is the greatest singular value of $A$,

167: and $\sigma_{k+1}$ is the $(k+1)^\st$ greatest.

168: Moreover, the singular values $\le \sigma_{k+1}$

169: often arise from noise in the process generating the data in $A$,

170: making the singular values of $A$ decay so slowly that

171: $\sigma_m \ge \sigma_{k+1}/10$.

172: When $m \ge$ 10,000, $\sigma_1/\sigma_{k+1} \le 100$,

173: and $\sigma_m \ge \sigma_{k+1}/10$, the rank-$k$ approximation $B$ produced

174: by a pivoted $QR$-decomposition algorithm

175: typically satisfies $\| A - B \| \sim \| A \|$

176: --- the ``approximation'' $B$ is effectively unrelated

177: to the matrix $A$ being approximated!

178: For large matrices whose ``signal-to-noise ratio''

179: $\sigma_1/\sigma_{k+1}$ is less than 10,000,

180: the $\sqrt{m}$ factor in~(\ref{gu_bound}) may be unacceptable.

181: Now, pivoted $QR$-decomposition algorithms are not the only algorithms

182: which can compute a rank-$k$ approximation using $\bigoh(nmk)$ flops.

183: However, other algorithms, such as those of

184: \cite{achlioptas-mcsherry0}, \cite{achlioptas-mcsherry}, \cite{chan-hansen},

185: \cite{clarkson-woodruff}, \cite{deshpande-rademacher-vempala-wang},

186: \cite{deshpande-vempala}, \cite{drineas-drinea-huggins},

187: \cite{drineas-kannan-mahoney2}, \cite{drineas-kannan-mahoney3},

188: \cite{drineas-mahoney-muthukrishnan1}, \cite{drineas-mahoney-muthukrishnan2},

189: \cite{friedland-kaveh-niknejad-zare}, \cite{frieze-kannan},

190: \cite{frieze-kannan-vempala0}, \cite{frieze-kannan-vempala},

191: \cite{goreinov-tyrtyshnikov}, \cite{goreinov-tyrtyshnikov-zamarashkin2},

192: \cite{goreinov-tyrtyshnikov-zamarashkin1}, \cite{gu-eisenstat},

193: \cite{har-peled},

194: \cite{liberty-woolfe-martinsson-rokhlin-tygert}, \cite{mahoney-drineas},

195: \cite{papadimitriou-raghavan-tamaki-vempala},

196: \cite{sarlos3}, \cite{sarlos4}, \cite{sun-xie-zhang-faloutsos},

197: \cite{tyrtyshnikov}, and~\cite{woolfe-liberty-rokhlin-tygert},

198: also yield accuracies involving factors of at least $\sqrt{m}$

199: when the singular values $\sigma_{k+1}$, $\sigma_{k+2}$, $\sigma_{k+3}$, \dots\

200: of $A$ decay slowly.

201: (The decay is rather slow if, for example,

202: $\sigma_{k+j} \sim j^\alpha \, \sigma_{k+1}$

203: for $j = 1$,~$2$,~$3$, \dots, with $-1/2 < \alpha \le 0$.

204: Many of these other algorithms are designed to produce approximations

205: having special properties not treated in the present paper,

206: and their spectral-norm accuracy is good when the singular values decay

207: sufficiently fast. Fairly recent surveys of algorithms

208: for low-rank approximation are available in~\cite{sarlos3}, \cite{sarlos4},

209: and~\cite{liberty-woolfe-martinsson-rokhlin-tygert}.)

210:

211: The algorithm described in the present paper produces

212: a rank-$k$ approximation $B$ to $A$ such that

213: %

214: \begin{equation}

215: \label{very_rough}

216: \| A - B \| \le C \, m^{1/(4i+2)} \, \sigma_{k+1}

217: \end{equation}

218: %

219: with very high probability (typically $1-10^{-15}$, independent of $A$,

220: with the choice of parameters from Remark~\ref{par_remark} below),

221: where $\|A-B\|$ is the spectral norm of $A-B$,

222: $i$ is a nonnegative integer specified by the user,

223: $\sigma_{k+1}$ is the $(k+1)^\st$ greatest singular value of $A$,

224: and $C$ is a constant independent of $A$

225: that theoretically may depend on the parameters of the algorithm.

226: (Numerical evidence such as that in Section~\ref{numerical}

227: suggests at the very least that $C < 10$;

228: (\ref{explicit_eval}) and~(\ref{the_point})

229: in Section~\ref{algorithm} provide more complicated theoretical bounds on $C$.)

230: The algorithm requires $\bigoh(nmki)$ floating-point operations when $i>0$.

231: In many applications of PCA, $i = 1$ or $i = 2$ is sufficient,

232: and the algorithm then requires only $\bigoh(nmk)$ flops.

233: The algorithm provides the rank-$k$ approximation $B$ in the form of an SVD,

234: outputting three matrices, $\tilde{U}$, $\tilde{\Sigma}$, and $\tilde{V}$,

235: such that $B = \tilde{U} \, \tilde{\Sigma} \, \tilde{V}^\T$,

236: where the columns of $\tilde{U}$ are orthonormal,

237: the columns of $\tilde{V}$ are orthonormal,

238: and the entries of $\tilde{\Sigma}$ are all nonnegative

239: and zero off the diagonal.

240:

241: The algorithm of the present paper is randomized,

242: but succeeds with very high probability;

243: for example, the bound~(\ref{explicit_eval}) on its accuracy holds

244: with probability greater than $1-10^{-15}$.

245: The algorithm is similar to many recently discussed randomized algorithms

246: for low-rank approximation, but produces approximations of higher accuracy

247: when the singular values $\sigma_{k+1}$, $\sigma_{k+2}$, $\sigma_{k+3}$, \dots\

248: of the matrix being approximated decay slowly; see, for example, \cite{sarlos3}

249: or~\cite{liberty-woolfe-martinsson-rokhlin-tygert}.

250: The algorithm is a variant of that in~\cite{roweis},

251: and the analysis of the present paper should extend to the algorithm

252: of~\cite{roweis}; \cite{roweis} stimulated the authors' collaboration.

253: The algorithm may be regarded as a generalization

254: of the randomized power methods of~\cite{dixon}

255: and~\cite{kuczynski-wozniakowski},

256: and in fact we use the latter to ascertain the approximations' accuracy

257: rapidly and reliably.

258:

259: The algorithm admits obvious ``out-of-core'' and parallel implementations

260: (assuming that the user chooses the parameter $i$ in~(\ref{very_rough})

261: to be reasonably small).

262: As with the algorithms of~\cite{dixon}, \cite{kuczynski-wozniakowski},

263: \cite{liberty-woolfe-martinsson-rokhlin-tygert},

264: \cite{martinsson-rokhlin-tygert3}, \cite{roweis},

265: \cite{sarlos3}, and~\cite{sarlos4},

266: the core steps of the algorithm of the present paper

267: involve the application of the matrix $A$ being approximated

268: and its transpose $A^\T$ to random vectors.

269: The algorithm is more efficient when $A$ and $A^\T$ can be applied rapidly

270: to arbitrary vectors, such as when $A$ is sparse.

271:

272: Throughout the present paper, we use $\Id$ to denote an identity matrix.

273: We use $\0s$ to denote a matrix whose entries are all zeros.

274: For any matrix $A$, we use $\|A\|$ to denote the spectral norm of $A$,

275: that is, $\|A\|$ is the greatest singular value of $A$.

276: Furthermore, the entries of all matrices in the present paper are real valued,

277: though the algorithm and analysis extend trivially to matrices

278: whose entries are complex valued.

279:

280: The present paper has the following structure:

281: Section~\ref{prelims} collects together various known facts

282: which later sections utilize.

283: Section~\ref{apparatus} provides the principal lemmas used in bounding

284: the accuracy of the algorithm in Section~\ref{algorithm}.

285: Section~\ref{algorithm} describes the algorithm of the present paper.

286: Section~\ref{numerical} illustrates the performance of the algorithm

287: via several numerical examples.

288: The appendix, Section~\ref{appendix}, proves two lemmas stated earlier

289: in Section~\ref{apparatus}.

290: We encourage the reader to begin with Sections~\ref{algorithm}

291: and~\ref{numerical}, referring back to the relevant portions

292: of Sections~\ref{prelims} and~\ref{apparatus} as they are referenced.

293:

294:

295:

296: \section{Preliminaries}

297: \label{prelims}

298:

299: In this section, we summarize various facts about matrices and functions.

300: Subsection~\ref{general_singular_values} discusses the singular values

301: of arbitrary matrices. Subsection~\ref{random_singular_values}

302: discusses the singular values of certain random matrices.

303: Subsection~\ref{monotone} observes that a certain function is monotone.

304:

305:

306: \subsection{Singular values of general matrices}

307: \label{general_singular_values}

308:

309:

310: The following trivial technical lemma will be needed

311: in Section~\ref{apparatus}.

312:

313: \begin{lemma}

314: Suppose that $m$ and $n$ are positive integers with $m \ge n$.

315: Suppose further that $A$ is a real $m \times n$ matrix

316: such that the least (that is, the $n^\ith$ greatest) singular value $\sigma_n$

317: of $A$ is nonzero.

318:

319: Then,

320: %

321: \begin{equation}

322: \label{pseudoinverse_norm}

323: \left\| (A^\T \, A)^{-1} \, A^\T \right\| = \frac{1}{\sigma_n}.

324: \end{equation}

325: %

326: \end{lemma}

327:

328:

329: The following lemma states that the greatest singular value of a matrix $A$

330: is at least as large as the greatest singular value

331: of any rectangular block of entries in $A$;

332: the lemma is a straightforward consequence

333: of the minimax properties of singular values

334: (see, for example, Section~47 of Chapter~2 in~\cite{wilkinson}).

335:

336: \begin{lemma}

337: \label{minimax_consequence}

338: Suppose that $k$, $l$, $m$, and~$n$ are positive integers

339: with $k \le m$ and $l \le n$.

340: Suppose further that $A$ is a real $m \times n$ matrix,

341: and $B$ is a $k \times l$ rectangular block of entries in $A$.

342:

343: Then, the greatest singular value of $B$ is at most

344: the greatest singular value of $A$.

345: \end{lemma}

346:

347:

348: The following classical lemma provides an approximation $Q \, S$

349: to an $n \times l$ matrix $R$

350: via an $n \times k$ matrix $Q$ whose columns are orthonormal,

351: and a $k \times l$ matrix $S$.

352: As remarked in Observation~\ref{least_squares},

353: the proof of this lemma provides a classic algorithm for computing $Q$ and $S$,

354: given $R$. We include the proof since we will be using this algorithm.

355:

356: \begin{lemma}

357: Suppose that $k$, $l$, and $n$ are positive integers with $k < l \le n$,

358: and $R$ is a real $n \times l$ matrix.

359:

360: Then, there exist a real $n \times k$ matrix $Q$

361: whose columns are orthonormal,

362: and a real $k \times l$ matrix $S$, such that

363: %

364: \begin{equation}

365: \label{svd_qr}

366: \| Q \, S - R \| \le \rho_{k+1},

367: \end{equation}

368: %

369: where $\rho_{k+1}$ is the $(k+1)^\ist$ greatest singular value of $R$.

370: \end{lemma}

371:

372: \begin{proof}

373: We start by forming an SVD of $R$,

374: %

375: \begin{equation}

376: \label{little_svd}

377: R = U \, \Sigma \, V^\T,

378: \end{equation}

379: %

380: where $U$ is a real $n \times l$ matrix whose columns are orthonormal,

381: $V$ is a real $l \times l$ matrix whose columns are orthonormal,

382: and $\Sigma$ is a real diagonal $l \times l$ matrix, such that

383: %

384: \begin{equation}

385: \label{little_ordering}

386: \Sigma_{j,j} = \rho_j

387: \end{equation}

388: %

389: for $j = 1$,~$2$, \dots, $l-1$,~$l$,

390: where $\Sigma_{j,j}$ is the entry in row $j$ and column $j$ of $\Sigma$,

391: and $\rho_j$ is the $j^\th$ greatest singular value of $R$.

392: We define $Q$ to be the leftmost $n \times k$ block of $U$,

393: and $P$ to be the rightmost $n \times (l-k)$ block of $U$, so that

394: %

395: \begin{equation}

396: \label{left_sing}

397: U = \left( \begin{array}{c|c} Q & P \end{array} \right).

398: \end{equation}

399: %

400: We define $S$ to be the uppermost $k \times l$ block of $\Sigma \, V^\T$,

401: and $T$ to be the lowermost $(l-k) \times l$ block of $\Sigma \, V^\T$,

402: so that

403: %

404: \begin{equation}

405: \label{right_sing}

406: \Sigma \, V^\T = \left( \begin{array}{c} S \\\hline T \end{array} \right).

407: \end{equation}

408: %

409: Combining~(\ref{little_svd}), (\ref{little_ordering}),

410: (\ref{left_sing}), (\ref{right_sing}),

411: and the fact that the columns of $U$ are orthonormal,

412: as are the columns of $V$, yields~(\ref{svd_qr}).

413: \end{proof}

414:

415:

416: \begin{observation}

417: \label{least_squares}

418: In order to compute the matrices $Q$ and $S$ in~(\ref{svd_qr})

419: from the matrix $R$,

420: we can construct~(\ref{little_svd}),

421: and then form $Q$ and $S$

422: according to~(\ref{left_sing}) and~(\ref{right_sing}).

423: (See, for example, Chapter~8 in~\cite{golub-van_loan} for details

424: concerning the computation of the SVD.)

425: \end{observation}

426:

427:

428:

429: \subsection{Singular values of random matrices}

430: \label{random_singular_values}

431:

432:

433: The following lemma provides a highly probable upper bound

434: on the greatest singular value

435: of a square matrix whose entries are independent, identically distributed

436: (i.i.d.) Gaussian random variables of zero mean and unit variance;

437: Formula~8.8 in~\cite{goldstine-von_neumann} provides an equivalent formulation

438: of the lemma.

439:

440: \begin{lemma}

441: \label{greatest_bound}

442: Suppose that $n$ is a positive integer,

443: $G$ is a real $n \times n$ matrix whose entries are

444: i.i.d.\ Gaussian random variables of zero mean and unit variance,

445: and $\gamma$ is a positive real number, such that $\gamma > 1$ and

446: %

447: \begin{equation}

448: \label{failure_prob}

449: 1 - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi n \gamma^2}}

450:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^n

451: \end{equation}

452: %

453: is nonnegative.

454:

455: Then, the greatest singular value of $G$ is at most $\sqrt{2n} \, \gamma$

456: with probability not less than the amount in~(\ref{failure_prob}).

457: \end{lemma}

458:

459:

460: Combining Lemmas~\ref{minimax_consequence} and~\ref{greatest_bound}

461: yields the following lemma,

462: providing a highly probable upper bound on the greatest singular value

463: of a rectangular matrix whose entries are i.i.d.\ Gaussian

464: random variables of zero mean and unit variance.

465:

466: \begin{lemma}

467: \label{greatest_value}

468: Suppose that $l$, $m$, and $n$ are positive integers

469: with $n \ge l$ and $n \ge m$.

470: Suppose further that $G$ is a real $l \times m$ matrix whose entries are

471: i.i.d.\ Gaussian random variables of zero mean and unit variance,

472: and $\gamma$ is a positive real number, such that

473: $\gamma > 1$ and~(\ref{failure_prob}) is nonnegative.

474:

475: Then, the greatest singular value of $G$ is at most $\sqrt{2n} \, \gamma$

476: with probability not less than the amount in~(\ref{failure_prob}).

477: \end{lemma}

478:

479:

480: The following lemma provides a highly probable lower bound

481: on the least singular value

482: of a rectangular matrix whose entries are i.i.d.\ Gaussian

483: random variables of zero mean and unit variance;

484: Formula~2.5 in~\cite{chen-dongarra}

485: and the proof of Lemma~4.1 in~\cite{chen-dongarra}

486: together provide an equivalent formulation of Lemma~\ref{least_value}.

487:

488: \begin{lemma}

489: \label{least_value}

490: Suppose that $j$ and $l$ are positive integers with $j \le l$.

491: Suppose further that $G$ is a real $l \times j$ matrix whose entries are

492: i.i.d.\ Gaussian random variables of zero mean and unit variance,

493: and $\beta$ is a positive real number, such that

494: %

495: \begin{equation}

496: \label{failure_prob2}

497: 1 - \frac{1}{\sqrt{2 \pi \, (l-j+1)}}

498:  \, \left( \frac{e}{(l-j+1) \, \beta} \right)^{l-j+1}

499: \end{equation}

500: %

501: is nonnegative.

502:

503: Then, the least (that is, the $j^\ith$ greatest) singular value

504: of $G$ is at least $1 / (\sqrt{l} \; \beta)$

505: with probability not less than the amount in~(\ref{failure_prob2}).

506: \end{lemma}

507:

508:

509:

510: \subsection{A monotone function}

511: \label{monotone}

512:

513:

514: The following technical lemma will be needed

515: in Section~\ref{algorithm}.

516:

517: \begin{lemma}

518: \label{monotonicity}

519: Suppose that $\alpha$ is a nonnegative real number,

520: and $f$ is the function defined on $(0,\infty)$ via the formula

521: %

522: \begin{equation}

523: f(x) = \frac{1}{\sqrt{2 \pi x}} \left( \frac{e\alpha}{x} \right)^x.

524: \end{equation}

525:

526: Then, $f$ decreases monotonically for $x > \alpha$.

527: \end{lemma}

528:

529: \begin{proof}

530: The derivative of $f$ is

531: %

532: \begin{equation}

533: \label{derivative}

534: f'(x) = f(x) \left( \ln\left(\frac{\alpha}{x}\right) - \frac{1}{2x} \right)

535: \end{equation}

536: %

537: for any positive real number $x$.

538: The right-hand side of~(\ref{derivative}) is negative when $x > \alpha$.

539: \end{proof}

540:

541:

542:

543: \section{Mathematical apparatus}

544: \label{apparatus}

545:

546: In this section, we provide lemmas to be used in Section~\ref{algorithm}

547: in bounding the accuracy of the algorithm of the present paper.

548:

549: The following lemma, proven in the appendix (Section~\ref{appendix}),

550: shows that the product $A \, Q \, Q^\T$

551: of matrices $A$, $Q$, and $Q^\T$

552: is a good approximation to a matrix $A$,

553: provided that there exist matrices $G$ and $S$ such that

554: %

555: \begin{enumerate}

556: %

557: \item[1.] the columns of $Q$ are orthonormal,

558: %

559: \item[2.] $Q \, S$ is a good approximation to $(G \, (A \, A^\T)^i \, A)^\T$,

560: and

561: %

562: \item[3.] there exists a matrix $F$ such that $\| F \|$ is not too large,

563: and $F \, G \, (A \, A^\T)^i \, A$ is a good approximation to $A$.

564: %

565: \end{enumerate}

566:

567: \begin{lemma}

568: \label{all_together2}

569: Suppose that $i$, $k$, $l$, $m$, and~$n$ are positive integers

570: with $k \le l \le m \le n$.

571: Suppose further that $A$ is a real $m \times n$ matrix,

572: $Q$ is a real $n \times k$ matrix whose columns are orthonormal,

573: $S$ is a real $k \times l$ matrix,

574: $F$ is a real $m \times l$ matrix,

575: and $G$ is a real $l \times m$ matrix.

576:

577: Then,

578: %

579: \begin{equation}

580: \label{reconstruction2}

581: \| A \, Q \, Q^\T - A \|

582: \le 2 \, \| F \, G \, (A \, A^\T)^i \, A - A \|

583:   + 2 \, \| F \| \, \| Q \, S - (G \, (A \, A^\T)^i \, A)^\T \|.

584: \end{equation}

585: %

586: \end{lemma}

587:

588:

589: The following lemma, proven in the appendix (Section~\ref{appendix}),

590: states that,

591: for any positive integer $i$, matrix $A$, and matrix $G$ whose entries are

592: i.i.d.\ Gaussian random variables of zero mean and unit variance,

593: with very high probability there exists a matrix $F$

594: with a reasonably small norm,

595: such that $F \, G \, (A \, A^\T)^i \, A$ is a good approximation to $A$.

596: This lemma is similar to Lemma~19 of~\cite{martinsson-rokhlin-tygert3}.

597:

598: \begin{lemma}

599: \label{probability_bounds2}

600: Suppose that $i$, $j$, $k$, $l$, $m$, and~$n$ are positive integers

601: with $j < k < l < m \le n$.

602: Suppose further that $A$ is a real $m \times n$ matrix,

603: $G$ is a real $l \times m$ matrix whose entries are

604: i.i.d.\ Gaussian random variables of zero mean and unit variance,

605: and $\beta$ and $\gamma$ are positive real numbers, such that

606: the $j^\ith$ greatest singular value $\sigma_j$ of $A$ is positive,

607: $\gamma > 1$, and

608: %

609: \begin{multline}

610: \label{probability2}

611: \Phi

612:   = 1 - \frac{1}{\sqrt{2 \pi \, (l-j+1)}}

613:  \, \left( \frac{e}{(l-j+1) \, \beta} \right)^{l-j+1} \\

614:   - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, \max(m-k,l) \; \gamma^2}}

615:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^{\max(m-k,\,l)} \\

616:   - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, l \, \gamma^2}}

617:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^l

618: \end{multline}

619: %

620: is nonnegative.

621:

622: Then, there exists a real $m \times l$ matrix $F$ such that

623: %

624: \begin{multline}

625: \label{approximation2}

626: \| F \, G \, (A \, A^\T)^i \, A - A \|

627: \le \sqrt{ 2 l^2 \, \beta^2 \, \gamma^2 + 1 }

628:  \;\; \sigma_{j+1} \\

629:   + \sqrt{ 2 l \, \max(m-k,l) \, \beta^2 \, \gamma^2

630:         \, \left( \frac{\sigma_{k+1}}{\sigma_j} \right)^{4i} + 1 }

631:  \;\; \sigma_{k+1}

632: \end{multline}

633: %

634: and

635: %

636: \begin{equation}

637: \label{small_norm2}

638: \| F \| \le \frac{\sqrt{l} \; \beta}{(\sigma_j)^{2i}}

639: \end{equation}

640: %

641: with probability not less than $\Phi$ defined in~(\ref{probability2}),

642: where $\sigma_j$ is the $j^\ith$ greatest singular value of $A$,

643: $\sigma_{j+1}$ is the $(j+1)^\ist$ greatest singular value of $A$,

644: and $\sigma_{k+1}$ is the $(k+1)^\ist$ greatest singular value of $A$.

645: \end{lemma}

646:

647:

648: Given a matrix $A$,

649: and a matrix $G$ whose entries are i.i.d.\ Gaussian random variables

650: of zero mean and unit variance,

651: the following lemma provides a highly probable upper bound

652: on the singular values of the product $G \, A$

653: in terms of the singular values of $A$.

654: This lemma is reproduced from~\cite{martinsson-rokhlin-tygert3},

655: where it appears as Lemma~20.

656:

657: \begin{lemma}

658: \label{singular_value_stretching}

659: Suppose that $j$, $k$, $l$, $m$, and~$n$ are positive integers

660: with $k < l$, such that $k + j < m$ and $k + j < n$.

661: Suppose further that $A$ is a real $m \times n$ matrix,

662: $G$ is a real $l \times m$ matrix whose entries are

663: i.i.d.\ Gaussian random variables of zero mean and unit variance,

664: and $\gamma$ is a positive real number, such that

665: $\gamma > 1$ and

666: %

667: \begin{multline}

668: \label{probability3}

669: \Xi

670:   = 1 - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, \max(m-k-j,l) \, \gamma^2}}

671:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^{\max(m-k-j,\,l)} \\

672:   - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, \max(k+j,l) \; \gamma^2}}

673:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^{\max(k+j,\,l)}

674: \end{multline}

675: %

676: is nonnegative.

677:

678: Then,

679: %

680: \begin{equation}

681: \label{stretched_singular_value}

682: \rho_{k+1} \le \sqrt{2 \, \max(k+j,l)} \; \gamma \; \sigma_{k+1}

683:              + \sqrt{2 \, \max(m-k-j,l)} \; \gamma \; \sigma_{k+j+1}

684: \end{equation}

685: %

686: with probability not less than $\Xi$ defined in~(\ref{probability3}),

687: where $\rho_{k+1}$ is the $(k+1)^\ist$ greatest singular value of $G \, A$,

688: $\sigma_{k+1}$ is the $(k+1)^\ist$ greatest singular value of $A$,

689: and $\sigma_{k+j+1}$ is the $(k+j+1)^\ist$ greatest singular value of $A$.

690: \end{lemma}

691:

692:

693: The following corollary follows immediately from the preceding lemma,

694: by replacing the matrix $A$ with $(A \, A^\T)^i \, A$,

695: the integer $k$ with $j$, and the integer $j$ with $k-j$.

696:

697: \begin{corollary}

698: \label{singular_value_stretching2}

699: Suppose $i$, $j$, $k$, $l$, $m$, and~$n$ are positive integers

700: with $j < k < l < m \le n$.

701: Suppose further that $A$ is a real $m \times n$ matrix,

702: $G$ is a real $l \times m$ matrix whose entries are

703: i.i.d.\ Gaussian random variables of zero mean and unit variance,

704: and $\gamma$ is a positive real number, such that

705: $\gamma > 1$ and

706: %

707: \begin{multline}

708: \label{probability32}

709: \Psi

710:   = 1 - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, \max(m-k,l) \, \gamma^2}}

711:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^{\max(m-k,\,l)} \\

712:   - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, l \; \gamma^2}}

713:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^l

714: \end{multline}

715: %

716: is nonnegative.

717:

718: Then,

719: %

720: \begin{equation}

721: \label{stretched_singular_value2}

722: \rho_{j+1} \le \sqrt{2 l} \; \gamma \; (\sigma_{j+1})^{2i+1}

723:              + \sqrt{2 \, \max(m-k,l)} \; \gamma \; (\sigma_{k+1})^{2i+1}

724: \end{equation}

725: %

726: with probability not less than $\Psi$ defined in~(\ref{probability32}),

727: where $\rho_{j+1}$ is the $(j+1)^\ist$ greatest singular value

728: of $G \, (A \, A^\T)^i \, A$,

729: $\sigma_{j+1}$ is the $(j+1)^\ist$ greatest singular value of $A$,

730: and $\sigma_{k+1}$ is the $(k+1)^\ist$ greatest singular value of $A$.

731: \end{corollary}

732:

733:

734:

735: \section{Description of the algorithm}

736: \label{algorithm}

737:

738: In this section, we describe the algorithm of the present paper,

739: providing details about its accuracy and computational costs.

740: Subsection~\ref{main_algorithm} describes the basic algorithm.

741: Subsection~\ref{costs} tabulates the computational costs of the algorithm.

742: Subsection~\ref{modified} describes a complementary algorithm.

743: Subsection~\ref{blanczos} describes a computationally more expensive variant

744: that is somewhat more accurate and tolerant to roundoff.

745:

746:

747:

748: \subsection{The algorithm}

749: \label{main_algorithm}

750:

751: Suppose that $i$, $k$, $m$, and $n$ are positive integers

752: with $2k < m \le n$, and $A$ is a real $m \times n$ matrix.

753: In this subsection, we will construct an approximation to an SVD of $A$

754: such that

755: %

756: \begin{equation}

757: \label{sort_of_svd}

758: \| A - U \, \Sigma \, V^\T \| \le C \, m^{1/(4i+2)} \, \sigma_{k+1}

759: \end{equation}

760: %

761: with very high probability,

762: where $U$ is a real $m \times k$ matrix

763: whose columns are orthonormal,

764: $V$ is a real $n \times k$ matrix whose columns are orthonormal,

765: $\Sigma$ is a real diagonal $k \times k$ matrix

766: whose entries are all nonnegative,

767: $\sigma_{k+1}$ is the $(k+1)^\st$ greatest singular value of $A$,

768: and $C$ is a constant independent of $A$ that depends on the parameters

769: of the algorithm.

770: (Section~\ref{numerical} will give an empirical indication of the size of $C$,

771: and~(\ref{explicit_eval}) will give one of our best theoretical estimates

772: to date.)

773:

774: Intuitively, we could apply $A^\T$ to several random vectors,

775: in order to identify the part of its range corresponding

776: to the larger singular values.

777: To enhance the decay of the singular values,

778: we apply $A^\T \, (A \, A^\T)^i$ instead.

779: Once we have identified most of the range of $A^\T$,

780: we perform several linear-algebraic manipulations in order to recover

781: an approximation to $A$.

782: (It is possible to obtain a similar, somewhat less accurate algorithm

783: by substituting our short, fat matrix $A$ for $A^\T$, and $A^\T$ for $A$.)

784:

785: More precisely, we choose an integer $l > k$ such that $l \le m-k$

786: (for example, $l = k + 12$), and make the following five steps:

787:

788: \begin{enumerate}

789: %

790: \item[1.] Using a random number generator,

791: form a real $l \times m$ matrix $G$ whose entries are

792: i.i.d.\ Gaussian random variables of zero mean and unit variance,

793: and compute the $l \times n$ product matrix

794: %

795: \begin{equation}

796: \label{product2}

797: R = G \, (A \, A^\T)^i \, A.

798: \end{equation}

799: %

800: \item[2.] Using an SVD,

801: form a real $n \times k$ matrix $Q$ whose columns are orthonormal,

802: such that there exists a real $k \times l$ matrix $S$ for which

803: %

804: \begin{equation}

805: \label{good_approx2}

806: \| Q \, S - R^\T \| \le \rho_{k+1},

807: \end{equation}

808: %

809: where $\rho_{k+1}$ is the $(k+1)^\st$ greatest singular value of $R$.

810: (See Observation~\ref{least_squares} for details concerning

811: the construction of such a matrix $Q$.)

812: %

813: \item[3.] Compute the $m \times k$ product matrix

814: %

815: \begin{equation}

816: \label{product_t}

817: T = A \, Q.

818: \end{equation}

819: %

820: \item[4.] Form an SVD of $T$,

821: %

822: \begin{equation}

823: \label{svd_small}

824: T = U \, \Sigma \, W^\T,

825: \end{equation}

826: %

827: where $U$ is a real $m \times k$ matrix whose columns are orthonormal,

828: $W$ is a real $k \times k$ matrix whose columns are orthonormal,

829: and $\Sigma$ is a real diagonal $k \times k$ matrix

830: whose entries are all nonnegative.

831: (See, for example, Chapter~8 in~\cite{golub-van_loan} for details

832: concerning the construction of such an SVD.)

833: %

834: \item[5.] Compute the $n \times k$ product matrix

835: %

836: \begin{equation}

837: \label{product3}

838: V = Q \, W.

839: \end{equation}

840: %

841: \end{enumerate}

842:

843:

844: The following theorem states precisely

845: that the matrices $U$, $\Sigma$, and $V$ satisfy~(\ref{sort_of_svd}).

846: See~(\ref{explicit_eval}) for a more compact (but less general) formulation.

847:

848: \begin{theorem}

849: \label{the_theorem}

850: Suppose that $i$, $k$, $l$, $m$, and $n$ are positive integers

851: with $k < l \le m-k$ and $m \le n$, and $A$ is a real $m \times n$ matrix.

852: Suppose further that $\beta$ and $\gamma$ are positive real numbers

853: such that $\gamma>1$,

854: %

855: \begin{equation}

856: \label{monotonicity_assump}

857: (l-k+1) \, \beta \ge 1,

858: \end{equation}

859: %

860: \begin{equation}

861: \label{simplifying_assump}

862: 2 \, l^2 \, \gamma^2 \, \beta^2 \ge 1,

863: \end{equation}

864: %

865: and

866: %

867: \begin{multline}

868: \label{final_prob}

869: \Pi = 1 - \frac{1}{2 \, (\gamma^2-1) \, \sqrt{\pi \, (m-k) \, \gamma^2}}

870:       \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^{m-k}

871:     - \frac{1}{2 \, (\gamma^2-1) \, \sqrt{\pi \, l \; \gamma^2}}

872:       \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^l \\

873:     - \frac{1}{\sqrt{2 \pi \, (l-k+1)}}

874:    \, \left( \frac{e}{(l-k+1) \, \beta} \right)^{l-k+1}

875: \end{multline}

876: %

877: is nonnegative.

878: Suppose in addition that $U$, $\Sigma$, and $V$ are the matrices

879: produced via the five-step algorithm of the present subsection, given above.

880:

881: Then,

882: %

883: \begin{equation}

884: \label{the_point}

885: \| A - U \, \Sigma \, V^\T \| \le 16 \, \gamma \, \beta \, l

886: \, \left(\frac{m-k}{l}\right)^{1/(4i+2)} \, \sigma_{k+1}

887: \end{equation}

888: %

889: with probability not less than $\Pi$,

890: where $\Pi$ is defined in~(\ref{final_prob}),

891: and $\sigma_{k+1}$ is the $(k+1)^\ist$ greatest singular value of $A$.

892: \end{theorem}

893:

894: \begin{proof}

895: Observing that $U \, \Sigma \, V^\T = A \, Q \, Q^\T$,

896: it is sufficient to prove that

897: %

898: \begin{equation}

899: \label{intermediate_step}

900: \| A \, Q \, Q^\T - A \| \le 16 \, \gamma \, \beta \, l

901: \, \left(\frac{m-k}{l}\right)^{1/(4i+2)} \, \sigma_{k+1}

902: \end{equation}

903: %

904: with probability $\Pi$,

905: where $Q$ is the matrix from~(\ref{good_approx2}),

906: since combining~(\ref{intermediate_step}), (\ref{product_t}),

907: (\ref{svd_small}), and~(\ref{product3}) yields~(\ref{the_point}).

908: We now prove~(\ref{intermediate_step}).

909:

910: First, we consider the case when

911: %

912: \begin{equation}

913: \label{first_case}

914: \| A \| \le \left(\frac{m-k}{l}\right)^{1/(4i+2)} \, \sigma_{k+1}.

915: \end{equation}

916: %

917: Clearly,

918: %

919: \begin{equation}

920: \label{triangle_submult}

921: \| A \, Q \, Q^\T - A \| \le \| A \| \, \| Q \| \, \| Q^\T \| + \| A \|.

922: \end{equation}

923: %

924: But, it follows from the fact that the columns of $Q$ are orthonormal that

925: %

926: \begin{equation}

927: \label{normortho1}

928: \| Q \| \le 1

929: \end{equation}

930: %

931: and

932: %

933: \begin{equation}

934: \label{normortho2}

935: \| Q^\T \| \le 1.

936: \end{equation}

937: %

938: Combining~(\ref{triangle_submult}), (\ref{normortho1}), (\ref{normortho2}),

939: (\ref{first_case}), and~(\ref{simplifying_assump})

940: yields~(\ref{intermediate_step}), completing the proof

941: for the case when~(\ref{first_case}) holds.

942:

943: For the remainder of the proof, we consider the case when

944: %

945: \begin{equation}

946: \label{second_case}

947: \| A \| > \left(\frac{m-k}{l}\right)^{1/(4i+2)} \, \sigma_{k+1}.

948: \end{equation}

949: %

950: To prove~(\ref{intermediate_step}),

951: we will use~(\ref{reconstruction2})

952: (which is restated and proven in Lemma~\ref{all_together22} in the appendix),

953: namely,

954: %

955: \begin{equation}

956: \label{basic_bound}

957: \| A \, Q \, Q^\T - A \|

958: \le 2 \, \| F \, G \, (A \, A^\T)^i \, A - A \|

959:   + 2 \, \| F \| \, \| Q \, S - (G \, (A \, A^\T)^i \, A)^\T \|

960: \end{equation}

961: %

962: for any real $m \times l$ matrix $F$,

963: where $G$ is from~(\ref{product2}),

964: and $Q$ and $S$ are from~(\ref{good_approx2}).

965: We now choose an appropriate matrix $F$.

966:

967: First, we define $j$ to be the positive integer such that

968: %

969: \begin{equation}

970: \label{reduced_rank}

971: \sigma_{j+1} \le \left(\frac{m-k}{l}\right)^{1/(4i+2)} \, \sigma_{k+1}

972:                < \sigma_j,

973: \end{equation}

974: %

975: where $\sigma_j$ is the $j^\th$ greatest singular value of $A$,

976: and $\sigma_{j+1}$ is the $(j+1)^\st$ greatest

977: (such an integer $j$ exists due to~(\ref{second_case})

978: and the supposition of the theorem that $l \le m-k$).

979: We then use the matrix $F$ from~(\ref{approximation2})

980: and~(\ref{small_norm2}) associated with this integer $j$, so that

981: (as stated in~(\ref{approximation2}) and~(\ref{small_norm2}),

982: which are restated and proven in Lemma~\ref{probability_bounds22}

983: in the appendix)

984: %

985: \begin{multline}

986: \label{number1}

987: \| F \, G \, (A \, A^\T)^i \, A - A \|

988: \le \sqrt{ 2 l^2 \, \beta^2 \, \gamma^2 + 1 }

989:  \;\; \sigma_{j+1} \\

990:   + \sqrt{ 2 l \, \max(m-k,l) \, \beta^2 \, \gamma^2

991:         \, \left( \frac{\sigma_{k+1}}{\sigma_j} \right)^{4i} + 1 }

992:  \;\; \sigma_{k+1}

993: \end{multline}

994: %

995: and

996: %

997: \begin{equation}

998: \label{number2}

999: \| F \| \le \frac{\sqrt{l} \; \beta}{(\sigma_j)^{2i}}

1000: \end{equation}

1001: %

1002: with probability not less than $\Phi$ defined in~(\ref{probability2}).

1003: Formula~(\ref{number1}) bounds the first term in the right-hand side

1004: of~(\ref{basic_bound}).

1005:

1006: To bound the second term in the right-hand side of~(\ref{basic_bound}),

1007: we observe that $j \le k$, due to~(\ref{reduced_rank})

1008: and the supposition of the theorem that $l \le m-k$.

1009: Combining~(\ref{good_approx2}), (\ref{product2}),

1010: (\ref{stretched_singular_value2}), and the fact that $j \le k$ yields

1011: %

1012: \begin{equation}

1013: \label{number3}

1014: \| Q \, S - (G \, (A \, A^\T)^i \, A)^\T \|

1015: \le \sqrt{2 l} \; \gamma \; (\sigma_{j+1})^{2i+1}

1016:   + \sqrt{2 \, \max(m-k,l)} \; \gamma \; (\sigma_{k+1})^{2i+1}

1017: \end{equation}

1018: %

1019: with probability not less than $\Psi$ defined in~(\ref{probability32}).

1020: %

1021: Combining~(\ref{number2}) and~(\ref{number3}) yields

1022: %

1023: \begin{multline}

1024: \label{number4}

1025: \| F \| \, \| Q \, S - (G \, (A \, A^\T)^i \, A)^\T \|

1026: \le \sqrt{2 \, l^2 \, \gamma^2 \, \beta^2} \; \sigma_{j+1} \\

1027:   + \sqrt{2 \, l \, \max(m-k,l) \, \gamma^2 \, \beta^2

1028:  \, \left(\frac{\sigma_{k+1}}{\sigma_j}\right)^{4i}} \;\; \sigma_{k+1}

1029: \end{multline}

1030: %

1031: with probability not less than $\Pi$ defined in~(\ref{final_prob}).

1032: The combination of Lemma~\ref{monotonicity}, (\ref{monotonicity_assump}),

1033: and the fact that $j \le k$ justifies the use of $k$

1034: (rather than the $j$ used in~(\ref{probability2}) for $\Phi$)

1035: in the last term in the right-hand side of~(\ref{final_prob}).

1036:

1037: Combining~(\ref{basic_bound}), (\ref{number1}), (\ref{number4}),

1038: (\ref{reduced_rank}), (\ref{simplifying_assump}),

1039: and the supposition of the theorem that $l \le m-k$

1040: yields~(\ref{intermediate_step}), completing the proof.

1041: \end{proof}

1042:

1043: \begin{remark}

1044: \label{par_remark}

1045: Choosing~$l=k+12$, $\beta = 2.57$, and $\gamma = 2.43$ in~(\ref{final_prob})

1046: and~(\ref{the_point}) yields

1047: %

1048: \begin{equation}

1049: \label{explicit_eval}

1050: \| A - U \, \Sigma \, V^\T \| \le 100 \, l

1051: \, \left(\frac{m-k}{l}\right)^{1/(4i+2)} \, \sigma_{k+1}

1052: \end{equation}

1053: %

1054: with probability greater than $1-10^{-15}$,

1055: where $\sigma_{k+1}$ is the $(k+1)^\st$ greatest singular value of $A$.

1056: Numerical experiments (some of which are reported in Section~\ref{numerical})

1057: indicate that the factor $100 l$ in the right-hand side

1058: of~(\ref{explicit_eval}) is much greater than necessary.

1059: \end{remark}

1060:

1061:

1062: \begin{remark}

1063: \label{six-step}

1064: Above, we permit $l$ to be any integer greater than $k$.

1065: Stronger theoretical bounds on the accuracy are available when $l \ge 2k$.

1066: Indeed, via an analysis similar to the proof of Theorem~\ref{the_theorem}

1067: (using in addition the result stated in the abstract of~\cite{chen-dongarra}),

1068: it can be shown that the following six-step algorithm with $l \ge 2k$

1069: produces matrices $U$, $\Sigma$, and $V$ satisfying the bound~(\ref{the_point})

1070: with its right-hand side reduced by a factor of $\sqrt{l}$:

1071: %

1072: \begin{enumerate}

1073: %

1074: \item[1.] Using a random number generator,

1075: form a real $l \times m$ matrix $G$ whose entries are

1076: i.i.d.\ Gaussian random variables of zero mean and unit variance,

1077: and compute the $l \times n$ product matrix

1078: %

1079: \begin{equation}

1080: \label{product2a}

1081: R = G \, (A \, A^\T)^i \, A.

1082: \end{equation}

1083: %

1084: \item[2.] Using a pivoted $QR$-decomposition algorithm,

1085: form a real $n \times l$ matrix $Q$ whose columns are orthonormal,

1086: such that there exists a real $l \times l$ matrix $S$ for which

1087: %

1088: \begin{equation}

1089: \label{good_approx2a}

1090: R^\T = Q \, S.

1091: \end{equation}

1092: %

1093: (See, for example, Chapter~5 in~\cite{golub-van_loan} for details concerning

1094: the construction of such a matrix $Q$.)

1095: %

1096: \item[3.] Compute the $m \times l$ product matrix

1097: %

1098: \begin{equation}

1099: \label{product_ta}

1100: T = A \, Q.

1101: \end{equation}

1102: %

1103: \item[4.] Form an SVD of $T$,

1104: %

1105: \begin{equation}

1106: \label{svd_smalla}

1107: T = \tilde{U} \, \tilde{\Sigma} \, W^\T,

1108: \end{equation}

1109: %

1110: where $\tilde{U}$ is a real $m \times l$ matrix whose columns are orthonormal,

1111: $W$ is a real $l \times l$ matrix whose columns are orthonormal,

1112: and $\tilde{\Sigma}$ is a real diagonal $l \times l$ matrix

1113: whose only nonzero entries are nonnegative and appear in nonincreasing order

1114: on the diagonal.

1115: (See, for example, Chapter~8 in~\cite{golub-van_loan} for details

1116: concerning the construction of such an SVD.)

1117: %

1118: \item[5.] Compute the $n \times l$ product matrix

1119: %

1120: \begin{equation}

1121: \label{product3a}

1122: \tilde{V} = Q \, W.

1123: \end{equation}

1124: %

1125: \item[6.] Extract the leftmost $m \times k$ block $U$ of $\tilde{U}$,

1126: the leftmost $n \times k$ block $V$ of $\tilde{V}$,

1127: and the leftmost uppermost $k \times k$ block $\Sigma$ of $\tilde{\Sigma}$.

1128: %

1129: \end{enumerate}

1130: %

1131: \end{remark}

1132:

1133:

1134:

1135: \subsection{Computational costs}

1136: \label{costs}

1137:

1138: In this subsection, we tabulate the number of floating-point operations

1139: required by the five-step algorithm described

1140: in Subsection~\ref{main_algorithm} as applied once to a matrix $A$.

1141:

1142: The algorithm incurs the following costs

1143: in order to compute an approximation to an SVD of $A$:

1144: %

1145: \begin{enumerate}

1146: %

1147: \item[1.] Forming $R$ in~(\ref{product2}) requires applying $A$

1148:           to $il$ column vectors, and $A^\T$ to $(i+1) \, l$ column vectors.

1149: %

1150: \item[2.] Computing $Q$ in~(\ref{good_approx2})

1151:           costs~$\bigoh(l^2 \, n)$.

1152: %

1153: \item[3.] Forming $T$ in~(\ref{product_t}) requires applying $A$

1154:           to $k$ column vectors.

1155: %

1156: \item[4.] Computing the SVD~(\ref{svd_small}) of $T$ costs~$\bigoh(k^2 \, m)$.

1157: %

1158: \item[5.] Forming $V$ in~(\ref{product3}) costs~$\bigoh(k^2 \, n)$.

1159: %

1160: \end{enumerate}

1161: %

1162: Summing up the costs in Steps 1--5 above,

1163: and using the fact that $k \le l \le m \le n$,

1164: we conclude that the algorithm of Subsection~\ref{main_algorithm} costs

1165: %

1166: \begin{equation}

1167: \label{svd_costs}

1168: C_{\rm PCA} = (il+k) \cdot C_A + (il+l) \cdot C_{A^\tinyT} + \bigoh(l^2 \, n)

1169: \end{equation}

1170: %

1171: floating-point operations,

1172: where $C_A$ is the cost of applying $A$ to a real $n \times 1$ column vector,

1173: and $C_{A^\tinyT}$ is the cost of applying $A^\T$

1174: to a real $m \times 1$ column vector.

1175:

1176: \begin{remark}

1177: We observe that the algorithm

1178: only requires applying $A$ to $il+k$ vectors and $A^\T$ to $il+l$ vectors;

1179: it does not require explicit access to the individual entries of $A$.

1180: This consideration can be important when $A$ and $A^\T$ are available solely

1181: in the form of procedures for their applications to arbitrary vectors.

1182: Often such procedures for applying $A$ and $A^\T$ cost much less than

1183: the standard procedure for applying a dense matrix to a vector.

1184: \end{remark}

1185:

1186:

1187:

1188: \subsection{A modified algorithm}

1189: \label{modified}

1190:

1191: In this subsection, we describe a simple modification

1192: of the algorithm described in Subsection~\ref{main_algorithm}.

1193: Again, suppose that $i$, $k$, $l$, $m$, and $n$ are positive integers

1194: with $k < l \le m-k$ and $m \le n$, and $A$ is a real $m \times n$ matrix.

1195: Then, the following five-step algorithm constructs an approximation

1196: to an SVD of $A^\T$ such that

1197: %

1198: \begin{equation}

1199: \label{sort_of_svdmod}

1200: \| A^\T - U \, \Sigma \, V^\T \| \le C \, m^{1/(4i)} \, \sigma_{k+1}

1201: \end{equation}

1202: %

1203: with very high probability,

1204: where $U$ is a real $n \times k$ matrix whose columns are orthonormal,

1205: $V$ is a real $m \times k$ matrix whose columns are orthonormal,

1206: $\Sigma$ is a real diagonal $k \times k$ matrix

1207: whose entries are all nonnegative,

1208: $\sigma_{k+1}$ is the $(k+1)^\st$ greatest singular value of $A$,

1209: and $C$ is a constant independent of $A$ that depends on the parameters

1210: of the algorithm:

1211:

1212: \begin{enumerate}

1213: %

1214: \item[1.] Using a random number generator,

1215: form a real $l \times m$ matrix $G$ whose entries are

1216: i.i.d.\ Gaussian random variables of zero mean and unit variance,

1217: and compute the $l \times m$ product matrix

1218: %

1219: \begin{equation}

1220: \label{product2mod}

1221: R = G \, (A \, A^\T)^i.

1222: \end{equation}

1223: %

1224: \item[2.] Using an SVD,

1225: form a real $m \times k$ matrix $Q$ whose columns are orthonormal,

1226: such that there exists a real $k \times l$ matrix $S$ for which

1227: %

1228: \begin{equation}

1229: \label{good_approx2mod}

1230: \| Q \, S - R^\T \| \le \rho_{k+1},

1231: \end{equation}

1232: %

1233: where $\rho_{k+1}$ is the $(k+1)^\st$ greatest singular value of $R$.

1234: (See Observation~\ref{least_squares} for details concerning

1235: the construction of such a matrix $Q$.)

1236: %

1237: \item[3.] Compute the $n \times k$ product matrix

1238: %

1239: \begin{equation}

1240: \label{product_tmod}

1241: T = A^\T \, Q.

1242: \end{equation}

1243: %

1244: \item[4.] Form an SVD of $T$,

1245: %

1246: \begin{equation}

1247: \label{svd_smallmod}

1248: T = U \, \Sigma \, W^\T,

1249: \end{equation}

1250: %

1251: where $U$ is a real $n \times k$ matrix whose columns are orthonormal,

1252: $W$ is a real $k \times k$ matrix whose columns are orthonormal,

1253: and $\Sigma$ is a real diagonal $k \times k$ matrix

1254: whose entries are all nonnegative.

1255: (See, for example, Chapter~8 in~\cite{golub-van_loan} for details

1256: concerning the construction of such an SVD.)

1257: %

1258: \item[5.] Compute the $m \times k$ product matrix

1259: %

1260: \begin{equation}

1261: \label{product3mod}

1262: V = Q \, W.

1263: \end{equation}

1264: %

1265: \end{enumerate}

1266:

1267: Clearly, (\ref{sort_of_svdmod}) is similar to~(\ref{sort_of_svd}),

1268: as~(\ref{product2mod}) is similar to~(\ref{product2}).

1269:

1270: \begin{remark}

1271: The ideas of Remark~\ref{six-step}

1272: are obviously relevant to the algorithm of the present subsection, too.

1273: \end{remark}

1274:

1275:

1276:

1277: \subsection{Blanczos}

1278: \label{blanczos}

1279:

1280: In this subsection, we describe a modification of the algorithm

1281: of Subsection~\ref{main_algorithm}, enhancing the accuracy

1282: at a little extra computational expense.

1283: Suppose that $i$, $k$, $l$, $m$, and $n$ are positive integers

1284: with $k < l$ and $(i+1)l \le m-k$, and $A$ is a real $m \times n$ matrix,

1285: such that $m \le n$.

1286: Then, the following five-step algorithm constructs an approximation

1287: $U \, \Sigma \, V^\T$ to an SVD of $A$:

1288:

1289: \begin{enumerate}

1290: %

1291: \item[1.] Using a random number generator,

1292: form a real $l \times m$ matrix $G$ whose entries are

1293: i.i.d.\ Gaussian random variables of zero mean and unit variance,

1294: and compute the $l \times n$ matrices

1295: $R^{(0)}$, $R^{(1)}$, \dots, $R^{(i-1)}$, $R^{(i)}$

1296: defined via the formulae

1297: %

1298: \begin{equation}

1299: R^{(0)} = G \, A,

1300: \end{equation}

1301: %

1302: \begin{equation}

1303: R^{(1)} = R^{(0)} \, A^T \, A,

1304: \end{equation}

1305: %

1306: \begin{equation}

1307: R^{(2)} = R^{(1)} \, A^T \, A,

1308: \end{equation}

1309: %

1310: \begin{equation*}

1311: \vdots

1312: \end{equation*}

1313: %

1314: \begin{equation}

1315: R^{(i-1)} = R^{(i-2)} \, A^T \, A,

1316: \end{equation}

1317: %

1318: \begin{equation}

1319: R^{(i)} = R^{(i-1)} \, A^T \, A.

1320: \end{equation}

1321: %

1322: Form the $((i+1)l) \times n$ matrix

1323: %

1324: \begin{equation}

1325: \label{product23}

1326: R = \left(\begin{array}{c} R^{(0)} \\ R^{(1)} \\ \vdots \\ R^{(i-1)} \\ R^{(i)}

1327: \end{array}\right).

1328: \end{equation}

1329: %

1330: \item[2.] Using a pivoted $QR$-decomposition algorithm,

1331: form a real $n \times ((i+1)l)$ matrix $Q$ whose columns are orthonormal,

1332: such that there exists a real $((i+1)l) \times ((i+1)l)$ matrix $S$ for which

1333: %

1334: \begin{equation}

1335: \label{good_approx23}

1336: R^\T = Q \, S.

1337: \end{equation}

1338: %

1339: (See, for example, Chapter~5 in~\cite{golub-van_loan} for details concerning

1340: the construction of such a matrix $Q$.)

1341: %

1342: \item[3.] Compute the $m \times ((i+1)l)$ product matrix

1343: %

1344: \begin{equation}

1345: \label{product_t3}

1346: T = A \, Q.

1347: \end{equation}

1348: %

1349: \item[4.] Form an SVD of $T$,

1350: %

1351: \begin{equation}

1352: \label{svd_small3}

1353: T = U \, \Sigma \, W^\T,

1354: \end{equation}

1355: %

1356: where $U$ is a real $m \times ((i+1)l)$ matrix whose columns are orthonormal,

1357: $W$ is a real $((i+1)l) \times ((i+1)l)$ matrix whose columns are orthonormal,

1358: and $\Sigma$ is a real diagonal $((i+1)l) \times ((i+1)l)$ matrix

1359: whose entries are all nonnegative.

1360: (See, for example, Chapter~8 in~\cite{golub-van_loan} for details

1361: concerning the construction of such an SVD.)

1362: %

1363: \item[5.] Compute the $n \times ((i+1)l)$ product matrix

1364: %

1365: \begin{equation}

1366: \label{product33}

1367: V = Q \, W.

1368: \end{equation}

1369: %

1370: \end{enumerate}

1371:

1372: An analysis similar to the proof of Theorem~\ref{the_theorem} above

1373: shows that the matrices $U$, $\Sigma$, and $V$ produced

1374: by the algorithm of the present subsection satisfy

1375: the same upper bounds~(\ref{the_point}) and~(\ref{explicit_eval})

1376: as the matrices produced by the algorithm of Subsection~\ref{main_algorithm}.

1377: If desired, one may produce a similarly accurate rank-$k$ approximation

1378: by arranging $U$, $\Sigma$, and $V$ such that the diagonal entries

1379: of $\Sigma$ appear in nonincreasing order,

1380: and then discarding all but the leftmost $k$ columns of $U$

1381: and all but the leftmost $k$ columns of $V$,

1382: and retaining only the leftmost uppermost $k \times k$ block of $\Sigma$.

1383: We will refer to the algorithm of the present subsection

1384: as ``blanczos,'' due to its similarity with the block Lanczos method

1385: (see, for example, Subsection~9.2.6 in~\cite{golub-van_loan}

1386: for a description of the block Lanczos method).

1387:

1388:

1389:

1390: \section{Numerical results}

1391: \label{numerical}

1392:

1393: In this section, we illustrate the performance of the algorithm

1394: of the present paper via several numerical examples.

1395:

1396: We use the algorithm to construct a rank-$k$ approximation,

1397: with $k = 10$, to the $m \times (2m)$ matrix $A$ defined

1398: via its singular value decomposition

1399: %

1400: \begin{equation}

1401: \label{test_matrix}

1402: A = U^{(A)} \, \Sigma^{(A)} \, (V^{(A)})^\T,

1403: \end{equation}

1404: %

1405: where $U^{(A)}$ is an $m \times m$ Hadamard matrix

1406: (a unitary matrix whose entries are all $\pm 1/\sqrt{m}$),

1407: $V^{(A)}$ is a $(2m) \times (2m)$ Hadamard matrix,

1408: and $\Sigma^{(A)}$ is an $m \times (2m)$ matrix

1409: whose entries are zero off the main diagonal,

1410: and whose diagonal entries are defined

1411: in terms of the $(k+1)^\st$ singular value $\sigma_{k+1}$ via the formulae

1412: %

1413: \begin{equation}

1414: \Sigma^{(A)}_{j,j} = \sigma_j = (\sigma_{k+1})^{\lfloor j/2 \rfloor/5}

1415: \end{equation}

1416: %

1417: for $j = 1$,~$2$, \dots, $9$,~$10$,

1418: where $\lfloor j/2 \rfloor$ is the greatest integer less than

1419: or equal to $j/2$, and

1420: %

1421: \begin{equation}

1422: \Sigma^{(A)}_{j,j} = \sigma_j = \sigma_{k+1} \cdot \frac{m-j}{m-11}

1423: \end{equation}

1424: %

1425: for $j = 11$,~$12$, \dots, $m-1$,~$m$.

1426: Thus, $\sigma_1 = 1$ and $\sigma_k = \sigma_{k+1}$ (recall that $k = 10$).

1427: We always choose $\sigma_{k+1} < 1$,

1428: so that $\sigma_1 \ge \sigma_2 \ge \dots \ge \sigma_{m-1} \ge \sigma_m$.

1429:

1430: Figure~1 plots the singular values

1431: $\sigma_1$,~$\sigma_2$, \dots, $\sigma_{m-1}$,~$\sigma_m$

1432: of $A$ with $m = 512$ and $\sigma_{k+1} = .001$;

1433: these parameters correspond to the first row of numbers in Table~1,

1434: the first row of numbers in Table~2, and the first row of numbers in Table~6.

1435:

1436: Table~1 reports the results of applying the five-step algorithm

1437: of Subsection~\ref{main_algorithm} to matrices of various sizes, with $i = 1$.

1438: Table~2 reports the results of applying the five-step algorithm

1439: of Subsection~\ref{main_algorithm} to matrices of various sizes, with $i = 0$.

1440: The algorithms of~\cite{sarlos3}, \cite{sarlos4},

1441: and~\cite{liberty-woolfe-martinsson-rokhlin-tygert}

1442: for low-rank approximation are essentially the same as the algorithm used

1443: for Table~2 (with $i=0$).

1444:

1445: Table~3 reports the results of applying the five-step algorithms

1446: of Subsections~\ref{main_algorithm} and~\ref{modified}

1447: with varying numbers of iterations $i$.

1448: Rows in the table where $i$ is enclosed in parentheses correspond

1449: to the algorithm of Subsection~\ref{modified};

1450: rows where $i$ is not enclosed in parentheses correspond

1451: to the algorithm of Subsection~\ref{main_algorithm}.

1452:

1453: Table~4 reports the results of applying the five-step algorithm

1454: of Subsection~\ref{main_algorithm} to matrices

1455: whose best rank-$k$ approximations have varying accuracies.

1456: Table~5 reports the results of applying the blanczos algorithm

1457: of Subsection~\ref{blanczos} to matrices

1458: whose best rank-$k$ approximations have varying accuracies.

1459:

1460: Table~6 reports the results of calculating pivoted $QR$-decompositions,

1461: via plane (Householder) reflections, of matrices of various sizes.

1462: We computed the pivoted $QR$-decomposition of the transpose of $A$ defined

1463: in~(\ref{test_matrix}), rather than of $A$ itself, for reasons of accuracy

1464: and efficiency. As pivoted $QR$-decomposition requires dense matrix arithmetic,

1465: our 1~GB of random-access memory (RAM) imposed the limit $m \le 4096$

1466: for Table~6.

1467:

1468: The headings of the tables have the following meanings:

1469: %

1470: \begin{itemize}

1471: %

1472: \item $m$ is the number of rows in $A$, the matrix being approximated.

1473: %

1474: \item $n$ is the number of columns in $A$, the matrix being approximated.

1475: %

1476: \item $i$ is the integer parameter used in the algorithms

1477:       of Subsections~\ref{main_algorithm}, \ref{modified}, and~\ref{blanczos}.

1478:       Rows in the tables where $i$ is enclosed in parentheses correspond

1479:       to the algorithm of Subsection~\ref{modified};

1480:       rows where $i$ is not enclosed in parentheses correspond

1481:       to either the algorithm of Subsection~\ref{main_algorithm} or

1482:       that of Subsection~\ref{blanczos}.

1483: %

1484: \item $t$ is the time in seconds required by the algorithm to create

1485:       an approximation and compute its accuracy $\delta$.

1486: %

1487: \item $\sigma_{k+1}$ is the $(k+1)^\st$ greatest singular value of $A$,

1488:       the matrix being approximated; $\sigma_{k+1}$ is also the accuracy

1489:       of the best possible rank-$k$ approximation to $A$.

1490: %

1491: \item $\delta$ is the accuracy of the approximation $U \, \Sigma \, V^\T$

1492:       (or $(QRP)^\T$, for Table~6) constructed by the algorithm.

1493:       For Tables~1--5,

1494: %

1495: \begin{equation}

1496: \delta = \| A - U \, \Sigma \, V^\T \|,

1497: \end{equation}

1498: %

1499: where $U$ is an $m \times k$ matrix whose columns are orthonormal,

1500: $V$ is an $n \times k$ matrix whose columns are orthonormal,

1501: and $\Sigma$ is a diagonal $k \times k$ matrix whose entries

1502: are all nonnegative; for Table~6,

1503: %

1504: \begin{equation}

1505: \delta = \| A - (QRP)^\T \|,

1506: \end{equation}

1507: %

1508: where $P$ is an $m \times m$ permutation matrix,

1509: $R$ is a $k \times m$ upper-triangular (meaning upper-trapezoidal) matrix,

1510: and $Q$ is an $n \times k$ matrix whose columns are orthonormal.

1511: \end{itemize}

1512:

1513: The values for $t$ are the average values over 3 independent randomized trials

1514: of the algorithm. The values for $\delta$ are the worst (maximum) values

1515: encountered in 3 independent randomized trials of the algorithm.

1516: The values for $\delta$ in each trial are those produced by 20 iterations

1517: of the power method applied to $A - U \, \Sigma \, V^\T$

1518: (or $A - (QRP)^\T$, for Table~6),

1519: started with a vector whose entries

1520: are i.i.d.\ centered Gaussian random variables.

1521: The theorems of~\cite{dixon} and~\cite{kuczynski-wozniakowski}

1522: guarantee that this power method produces accurate results

1523: with overwhelmingly high probability.

1524:

1525: We performed all computations using IEEE standard double-precision variables,

1526: whose mantissas have approximately one bit of precision less than 16 digits

1527: (so that the relative precision of the variables is approximately .2E--15).

1528: We ran all computations on one core

1529: of a 1.86~GHz Intel Centrino Core Duo microprocessor

1530: with 2~MB of L2 cache and 1~GB of RAM.

1531: We compiled the Fortran~77 code

1532: using the Lahey/Fujitsu Linux Express v6.2 compiler,

1533: with the optimization flag {\tt {-}{-}o2} enabled.

1534: We implemented a fast Walsh-Hadamard transform

1535: to apply rapidly the Hadamard matrices $U^{(A)}$ and $V^{(A)}$

1536: in~(\ref{test_matrix}).

1537: We used plane (Householder) reflections

1538: to compute all pivoted $QR$-decompositions.

1539: We used the LAPACK 3.1.1 divide-and-conquer SVD routine {\tt dgesdd}

1540: to compute all full SVDs.

1541: For the parameter $l$, we set $l = 12$ $(= k+2)$

1542: for all of the examples reported here.

1543:

1544: The experiments reported here and our further tests point

1545: to the following:

1546:

1547: \begin{enumerate}

1548: %

1549: \item The accuracies in Table~1 are superior to those in Table~2;

1550: the algorithm performs much better with $i>0$.

1551: (The algorithms of~\cite{liberty-woolfe-martinsson-rokhlin-tygert},

1552: \cite{sarlos3}, and~\cite{sarlos4}

1553: for low-rank approximation are essentially the same as the algorithm used

1554: for Tables~1 and~2 when $i=0$.)

1555: %

1556: \item The accuracies in Table~1 are superior to the corresponding accuracies

1557: in Table~6; the algorithm of the present paper produces higher accuracy

1558: than the classical pivoted $QR$-decompositions for matrices whose spectra

1559: decay slowly (such as those matrices tested in the present section).

1560: %

1561: \item The accuracies in Tables~1--3 appear to be proportional

1562: to $m^{1/(4i+2)} \, \sigma_{k+1}$ for the algorithm

1563: of Subsection~\ref{main_algorithm},

1564: and to be proportional to $m^{1/(4i)} \, \sigma_{k+1}$ for the algorithm

1565: of Subsection~\ref{modified},

1566: in accordance with~(\ref{sort_of_svd}) and~(\ref{sort_of_svdmod}).

1567: The numerical results reported here, as well as our further experiments,

1568: indicate that the theoretical bound~(\ref{the_point}) on the accuracy

1569: should remain valid with a greatly reduced constant in the right-hand side,

1570: independent of the matrix $A$ being approximated.

1571: See item~6 below for a discussion of Tables~4 and~5.

1572: %

1573: \item The timings in Tables~1--5 are consistent with~(\ref{svd_costs}),

1574: as we could (and did) apply the Hadamard matrices $U^{(A)}$ and $V^{(A)}$

1575: in~(\ref{test_matrix}) to vectors via fast Walsh-Hadamard transforms

1576: at a cost of $\bigoh(m \, \log(m))$ floating-point operations

1577: per matrix-vector multiplication.

1578: %

1579: \item The quality of the pseudorandom number generator has almost no effect

1580: on the accuracy of the algorithm, nor does substituting uniform variates

1581: for the normal variates.

1582: %

1583: \item The accuracies in Table~5 are superior to those in Table~4,

1584: particularly when the $k^\th$ greatest singular value $\sigma_k$

1585: of the matrix $A$ being approximated is very small. Understandably,

1586: the algorithm of Subsection~\ref{main_algorithm} would seem to break down

1587: when $(\sigma_k)^{2i+1}$ is less than the machine precision,

1588: while $\sigma_k$ itself is not,

1589: unlike the blanczos algorithm of Subsection~\ref{blanczos}.

1590: When $(\sigma_k)^{2i+1}$ is much less than the machine precision,

1591: while $\sigma_k$ is not,

1592: the accuracy of blanczos in the presence of roundoff is similar to that

1593: of the algorithm of Subsection~\ref{main_algorithm} run with a reduced $i$.

1594: When $(\sigma_k)^{2i+1}$ is much greater than the machine precision,

1595: the accuracy of blanczos is similar to that of the algorithm

1596: of Subsection~\ref{main_algorithm} run with $i$ being the same as

1597: in the blanczos algorithm.

1598: Since the blanczos algorithm of Subsection~\ref{blanczos}

1599: is so tolerant of roundoff,

1600: we suspect that the blanczos algorithm is

1601: a better general-purpose black-box tool

1602: for the computation of principal component analyses,

1603: despite its somewhat higher cost as compared with the algorithms

1604: of Subsections~\ref{main_algorithm} and~\ref{modified}.

1605: %

1606: \end{enumerate}

1607:

1608:

1609:

1610: \begin{remark}

1611: A MATLAB\registered\ implementation of the blanczos algorithm

1612: of Subsection~\ref{blanczos} is available on the file exchange at

1613: {\tt http://www.mathworks.com} in the package entitled,

1614: ``Principal Component Analysis.''

1615: \end{remark}

1616:

1617:

1618:

1619: \section{Appendix}

1620: \label{appendix}

1621:

1622: In this appendix, we restate and prove Lemmas~\ref{all_together2}

1623: and~\ref{probability_bounds2} from Section~\ref{apparatus}.

1624:

1625: The following lemma, stated earlier as Lemma~\ref{all_together2}

1626: in Section~\ref{apparatus},

1627: shows that the product $A \, Q \, Q^\T$

1628: of matrices $A$, $Q$, and $Q^\T$

1629: is a good approximation to a matrix $A$,

1630: provided that there exist matrices $G$ and $S$ such that

1631: %

1632: \begin{enumerate}

1633: %

1634: \item[1.] the columns of $Q$ are orthonormal,

1635: %

1636: \item[2.] $Q \, S$ is a good approximation to $(G \, (A \, A^\T)^i \, A)^\T$,

1637: and

1638: %

1639: \item[3.] there exists a matrix $F$ such that $\| F \|$ is not too large,

1640: and $F \, G \, (A \, A^\T)^i \, A$ is a good approximation to $A$.

1641: %

1642: \end{enumerate}

1643:

1644: \begin{lemma}

1645: \label{all_together22}

1646: Suppose that $i$, $k$, $l$, $m$, and~$n$ are positive integers

1647: with $k \le l \le m \le n$.

1648: Suppose further that $A$ is a real $m \times n$ matrix,

1649: $Q$ is a real $n \times k$ matrix whose columns are orthonormal,

1650: $S$ is a real $k \times l$ matrix,

1651: $F$ is a real $m \times l$ matrix,

1652: and $G$ is a real $l \times m$ matrix.

1653:

1654: Then,

1655: %

1656: \begin{equation}

1657: \label{reconstruction22}

1658: \| A \, Q \, Q^\T - A \|

1659: \le 2 \, \| F \, G \, (A \, A^\T)^i \, A - A \|

1660:   + 2 \, \| F \| \, \| Q \, S - (G \, (A \, A^\T)^i \, A)^\T \|.

1661: \end{equation}

1662: %

1663: \end{lemma}

1664:

1665: \begin{proof}

1666: The proof is straightforward, but tedious, as follows.

1667:

1668: To simplify notation, we define

1669: %

1670: \begin{equation}

1671: \label{shorter}

1672: B = (A \, A^\T)^i \, A.

1673: \end{equation}

1674:

1675: We obtain from the triangle inequality that

1676: %

1677: \begin{multline}

1678: \label{triangle}

1679: \| A \, Q \, Q^\T - A \|

1680: \le \| A \, Q \, Q^\T - F \, G \, B \, Q \, Q^\T \|

1681:   + \| F \, G \, B \, Q \, Q^\T - F \, G \, B \| \\

1682:   + \| F \, G \, B - A \|.

1683: \end{multline}

1684:

1685: First, we provide a bound

1686: for $\| A \, Q \, Q^\T - F \, G \, B \, Q \, Q^\T \|$.

1687: Clearly,

1688: %

1689: \begin{equation}

1690: \label{bound0}

1691: \| A \, Q \, Q^\T - F \, G \, B \, Q \, Q^\T \|

1692: \le \| A - F \, G \, B \| \, \| Q \| \, \| Q^\T \|.

1693: \end{equation}

1694: %

1695: It follows from the fact that the columns of $Q$ are orthonormal that

1696: %

1697: \begin{equation}

1698: \label{bound1}

1699: \| Q \| \le 1

1700: \end{equation}

1701: %

1702: and

1703: %

1704: \begin{equation}

1705: \label{bound2}

1706: \| Q^\T \| \le 1.

1707: \end{equation}

1708: %

1709: Combining~(\ref{bound0}), (\ref{bound1}), and~(\ref{bound2}) yields

1710: %

1711: \begin{equation}

1712: \label{simpler}

1713: \| A \, Q \, Q^\T - F \, G \, B \, Q \, Q^\T \| \le \| A - F \, G \, B \|.

1714: \end{equation}

1715:

1716: Next, we provide a bound

1717: for $\| F \, G \, B \, Q \, Q^\T - F \, G \, B \|$.

1718: Clearly,

1719: %

1720: \begin{equation}

1721: \label{triangle4}

1722: \| F \, G \, B \, Q \, Q^\T - F \, G \, B \|

1723: \le \| F \| \, \| G \, B \, Q \, Q^\T - G \, B \|.

1724: \end{equation}

1725: %

1726: It follows from the triangle inequality that

1727: %

1728: \begin{multline}

1729: \label{triangle3}

1730: \| G \, B \, Q \, Q^\T - G \, B \|

1731: \le \| G \, B \, Q \, Q^\T - S^\T \, Q^\T \, Q \, Q^\T \| \\

1732:   + \| S^\T \, Q^\T \, Q \, Q^\T - S^\T \, Q^\T \|

1733:   + \| S^\T \, Q^\T - G \, B \|.

1734: \end{multline}

1735:

1736: Furthermore,

1737: %

1738: \begin{equation}

1739: \label{prev}

1740: \| G \, B \, Q \, Q^\T - S^\T \, Q^\T \, Q \, Q^\T \|

1741: \le \| G \, B - S^\T \, Q^\T \| \, \| Q \| \, \| Q^\T \|.

1742: \end{equation}

1743: %

1744: Combining~(\ref{prev}), (\ref{bound1}), and~(\ref{bound2}) yields

1745: %

1746: \begin{equation}

1747: \label{bound4}

1748: \| G \, B \, Q \, Q^\T - S^\T \, Q^\T \, Q \, Q^\T \|

1749: \le \| G \, B - S^\T \, Q^\T \|.

1750: \end{equation}

1751:

1752: Also, it follows from the fact that the columns of $Q$ are orthonormal that

1753: %

1754: \begin{equation}

1755: \label{orthonormal}

1756: Q^\T \, Q = \Id.

1757: \end{equation}

1758: %

1759: It follows from~(\ref{orthonormal}) that

1760: %

1761: \begin{equation}

1762: \label{vanish}

1763: \| S^\T \, Q^\T \, Q \, Q^\T - S^\T \, Q^\T \| = 0.

1764: \end{equation}

1765: %

1766:

1767: Combining~(\ref{triangle3}), (\ref{bound4}), and~(\ref{vanish}) yields

1768: %

1769: \begin{equation}

1770: \label{triangle5}

1771: \| G \, B \, Q \, Q^\T - G \, B \| \le 2 \, \| S^\T \, Q^\T - G \, B \|.

1772: \end{equation}

1773: %

1774: Combining~(\ref{triangle4}) and~(\ref{triangle5}) yields

1775: %

1776: \begin{equation}

1777: \label{triangle6}

1778: \| F \, G \, B \, Q \, Q^\T - F \, G \, B \|

1779: \le 2 \, \| F \| \, \| S^\T \, Q^\T - G \, B \|.

1780: \end{equation}

1781: %

1782:

1783: Combining~(\ref{triangle}), (\ref{simpler}), (\ref{triangle6}),

1784: and~(\ref{shorter}) yields~(\ref{reconstruction22}).

1785: \end{proof}

1786:

1787:

1788: The following lemma, stated earlier as Lemma~\ref{probability_bounds2}

1789: in Section~\ref{apparatus}, shows that,

1790: for any positive integer $i$, matrix $A$, and matrix $G$ whose entries are

1791: i.i.d.\ Gaussian random variables of zero mean and unit variance,

1792: with very high probability there exists a matrix $F$

1793: with a reasonably small norm,

1794: such that $F \, G \, (A \, A^\T)^i \, A$ is a good approximation to $A$.

1795: This lemma is similar to Lemma~19 of~\cite{martinsson-rokhlin-tygert3}.

1796:

1797: \begin{lemma}

1798: \label{probability_bounds22}

1799: Suppose that $i$, $j$, $k$, $l$, $m$, and~$n$ are positive integers

1800: with $j < k < l < m \le n$.

1801: Suppose further that $A$ is a real $m \times n$ matrix,

1802: $G$ is a real $l \times m$ matrix whose entries are

1803: i.i.d.\ Gaussian random variables of zero mean and unit variance,

1804: and $\beta$ and $\gamma$ are positive real numbers, such that

1805: the $j^\ith$ greatest singular value $\sigma_j$ of $A$ is positive,

1806: $\gamma > 1$, and

1807: %

1808: \begin{multline}

1809: \label{probability22}

1810: \Phi

1811:   = 1 - \frac{1}{\sqrt{2 \pi \, (l-j+1)}}

1812:  \, \left( \frac{e}{(l-j+1) \, \beta} \right)^{l-j+1} \\

1813:   - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, \max(m-k,l) \; \gamma^2}}

1814:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^{\max(m-k,\,l)} \\

1815:   - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, l \, \gamma^2}}

1816:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^l

1817: \end{multline}

1818: %

1819: is nonnegative.

1820:

1821: Then, there exists a real $m \times l$ matrix $F$ such that

1822: %

1823: \begin{multline}

1824: \label{approximation22}

1825: \| F \, G \, (A \, A^\T)^i \, A - A \|

1826: \le \sqrt{ 2 l^2 \, \beta^2 \, \gamma^2 + 1 }

1827:  \;\; \sigma_{j+1} \\

1828:   + \sqrt{ 2 l \, \max(m-k,l) \, \beta^2 \, \gamma^2

1829:         \, \left( \frac{\sigma_{k+1}}{\sigma_j} \right)^{4i} + 1 }

1830:  \;\; \sigma_{k+1}

1831: \end{multline}

1832: %

1833: and

1834: %

1835: \begin{equation}

1836: \label{small_norm22}

1837: \| F \| \le \frac{\sqrt{l} \; \beta}{(\sigma_j)^{2i}}

1838: \end{equation}

1839: %

1840: with probability not less than $\Phi$ defined in~(\ref{probability22}),

1841: where $\sigma_j$ is the $j^\ith$ greatest singular value of $A$,

1842: $\sigma_{j+1}$ is the $(j+1)^\ist$ greatest singular value of $A$,

1843: and $\sigma_{k+1}$ is the $(k+1)^\ist$ greatest singular value of $A$.

1844: \end{lemma}

1845:

1846: \begin{proof}

1847: We prove the existence of a matrix $F$ satisfying~(\ref{approximation22})

1848: and~(\ref{small_norm22}) by constructing one.

1849:

1850: We start by forming an SVD of $A$,

1851: %

1852: \begin{equation}

1853: \label{svd2}

1854: A = U \, \Sigma \, V^\T,

1855: \end{equation}

1856: %

1857: where $U$ is a real unitary $m \times m$ matrix,

1858: $\Sigma$ is a real diagonal $m \times m$ matrix,

1859: and $V$ is a real $n \times m$ matrix whose columns are orthonormal, such that

1860: %

1861: \begin{equation}

1862: \label{ordering2}

1863: \Sigma_{p,p} = \sigma_p

1864: \end{equation}

1865: %

1866: for $p = 1$,~$2$, \dots, $m-1$,~$m$,

1867: where $\Sigma_{p,p}$ is the entry in row $p$ and column $p$ of $\Sigma$,

1868: and $\sigma_p$ is the $p^\th$ greatest singular value of $A$.

1869:

1870: Next, we define auxiliary matrices

1871: $H$, $R$, $\Gamma$, $S$, $T$, $\Theta$, and $P$.

1872: We define $H$ to be the leftmost $l \times j$ block

1873: of the $l \times m$ matrix $G \, U$,

1874: $R$ to be the $l \times (k-j)$ block of $G \, U$

1875: whose first column is the $(k+1)^\st$ column of $G \, U$,

1876: and $\Gamma$ to be the rightmost $l \times (m-k)$ block

1877: of $G \, U$, so that

1878: %

1879: \begin{equation}

1880: \label{partition2}

1881: G \, U = \left( \begin{array}{c|c|c} H & R & \Gamma \end{array} \right).

1882: \end{equation}

1883: %

1884: Combining the fact that $U$ is real and unitary,

1885: and the fact that the entries of $G$ are i.i.d.\ Gaussian

1886: random variables of zero mean and unit variance,

1887: we see that the entries of $H$ are also i.i.d.\ Gaussian

1888: random variables of zero mean and unit variance,

1889: as are the entries of $R$, and as are the entries of $\Gamma$.

1890: We define $H^{(-1)}$ to be the real $j \times l$ matrix

1891: given by the formula

1892: %

1893: \begin{equation}

1894: \label{definition_of_pseudoinverse2}

1895: H^{(-1)} = (H^\T \, H)^{-1} \, H^\T

1896: \end{equation}

1897: %

1898: ($H^\T \, H$ is invertible with high probability

1899: due to Lemma~\ref{least_value}).

1900: %

1901: We define $S$ to be the leftmost uppermost $j \times j$ block of $\Sigma$,

1902: $T$ to be the $(k-j) \times (k-j)$ block of $\Sigma$

1903: whose leftmost uppermost entry is the entry

1904: in the $(j+1)^\st$ row and $(j+1)^\st$ column of $\Sigma$,

1905: and $\Theta$ to be the rightmost lowermost $(m-k) \times (m-k)$ block

1906: of $\Sigma$, so that

1907: %

1908: \begin{equation}

1909: \label{svd_partition2}

1910: \Sigma

1911: = \left( \begin{array}{c|c|c} S   & \0s & \0s    \\\hline

1912:                               \0s & T   & \0s    \\\hline

1913:                               \0s & \0s & \Theta

1914:          \end{array} \right).

1915: \end{equation}

1916: %

1917: We define $P$ to be the real $m \times l$ matrix

1918: whose uppermost $j \times l$ block is the product $S^{-2i} \, H^{(-1)}$,

1919: whose entries are zero in the $(k-j) \times l$ block whose first row

1920: is the $(j+1)^\st$ row of $P$,

1921: and whose entries in the lowermost $(m-k) \times l$ block are zero,

1922: so that

1923: %

1924: \begin{equation}

1925: \label{pad2}

1926: P = \left( \begin{array}{c} S^{-2i} \, H^{(-1)} \\\hline \0s

1927:                                                 \\\hline \0s

1928:            \end{array} \right).

1929: \end{equation}

1930:

1931: Finally, we define $F$ to be the $m \times l$ matrix given by

1932: %

1933: \begin{equation}

1934: \label{inverter2}

1935: F = U \, P = U \, \left( \begin{array}{c} S^{-2i} \, H^{(-1)} \\\hline

1936:                                           \0s \\\hline \0s

1937:                          \end{array} \right).

1938: \end{equation}

1939:

1940: Combining~(\ref{definition_of_pseudoinverse2}), (\ref{pseudoinverse_norm}),

1941: the fact that the entries of $H$ are i.i.d.\ Gaussian

1942: random variables of zero mean and unit variance,

1943: and Lemma~\ref{least_value} yields

1944: %

1945: \begin{equation}

1946: \label{pseudoinverse2}

1947: \left\| H^{(-1)} \right\| \le \sqrt{l} \; \beta

1948: \end{equation}

1949: %

1950: with probability not less than

1951: %

1952: \begin{equation}

1953: 1 - \frac{1}{\sqrt{2 \pi \, (l-j+1)}}

1954:  \, \left( \frac{e}{(l-j+1) \, \beta} \right)^{l-j+1}.

1955: \end{equation}

1956: %

1957: Combining~(\ref{inverter2}), (\ref{pseudoinverse2}), (\ref{svd_partition2}),

1958: (\ref{ordering2}), the fact that $\Sigma$ is zero off its main diagonal,

1959: and the fact that $U$ is unitary yields~(\ref{small_norm22}).

1960:

1961: We now show that $F$ defined in~(\ref{inverter2})

1962: satisfies~(\ref{approximation22}).

1963:

1964: Combining~(\ref{svd2}), (\ref{partition2}), and~(\ref{inverter2}) yields

1965: %

1966: \begin{equation}

1967: \label{simplification12}

1968: F \, G \, (A \, A^\T)^i \, A - A

1969: = U \, \left( \left( \begin{array}{c} S^{-2i} \, H^{(-1)} \\\hline

1970:                                       \0s \\\hline \0s

1971:                      \end{array} \right)

1972:               \left( \begin{array}{c|c|c} H & R & \Gamma \end{array} \right)

1973:               \, \Sigma^{2i}

1974:             - \Id \right) \, \Sigma \, V^\T.

1975: \end{equation}

1976: %

1977: Combining~(\ref{definition_of_pseudoinverse2})

1978: and~(\ref{svd_partition2}) yields

1979: %

1980: \begin{multline}

1981: \label{simplification22}

1982: \left( \left( \begin{array}{c} S^{-2i} \, H^{(-1)} \\\hline \0s

1983:                                                    \\\hline \0s

1984:               \end{array} \right)

1985:        \left( \begin{array}{c|c|c} H & R & \Gamma \end{array} \right)

1986:        \, \Sigma^{2i}

1987:      - \Id \right) \, \Sigma \\

1988: = \left( \begin{array}{c|c|c}

1989:          \0s & S^{-2i} \, H^{(-1)} \, R \; T^{2i+1} &

1990:                S^{-2i} \, H^{(-1)} \, \Gamma \, \Theta^{2i+1} \\\hline

1991:          \0s & -T & \0s \\\hline

1992:          \0s & \0s & -\Theta

1993:   \end{array} \right).

1994: \end{multline}

1995: %

1996: Furthermore,

1997: %

1998: \begin{multline}

1999: \label{Frobenius2}

2000: \left\| \left( \begin{array}{c|c|c}

2001:        \0s & S^{-2i} \, H^{(-1)} \, R \; T^{2i+1} &

2002:              S^{-2i} \, H^{(-1)} \, \Gamma \, \Theta^{2i+1} \\\hline

2003:        \0s & -T & \0s \\\hline

2004:        \0s & \0s & -\Theta

2005: \end{array} \right) \right\|^2 \\

2006: \le \left\| S^{-2i} \, H^{(-1)} \, R \, T^{2i+1} \right\|^2

2007:   + \left\| S^{-2i} \, H^{(-1)} \, \Gamma \, \Theta^{2i+1} \right\|^2

2008:   + \| T \|^2 + \| \Theta \|^2.

2009: \end{multline}

2010:

2011: Moreover,

2012: %

2013: \begin{equation}

2014: \label{product_of_norms2}

2015: \left\| S^{-2i} \, H^{(-1)} \, R \, T^{2i+1} \right\|

2016: \le \left\| S^{-1} \right\|^{2i} \, \left\| H^{(-1)} \right\|

2017:  \, \| R \| \, \| T \|^{2i+1}

2018: \end{equation}

2019: %

2020: and

2021: %

2022: \begin{equation}

2023: \label{product_of_norms3}

2024: \left\| S^{-2i} \, H^{(-1)} \, \Gamma \, \Theta^{2i+1} \right\|

2025: \le \left\| S^{-1} \right\|^{2i} \, \left\| H^{(-1)} \right\|

2026:  \, \| \Gamma \| \, \| \Theta \|^{2i+1}.

2027: \end{equation}

2028: %

2029: Combining~(\ref{svd_partition2}) and~(\ref{ordering2}) yields

2030: %

2031: \begin{equation}

2032: \label{singular_value_bound1}

2033: \left\| S^{-1} \right\| \le \frac{1}{\sigma_j},

2034: \end{equation}

2035: %

2036: \begin{equation}

2037: \label{singular_value_bound2}

2038: \| T \| \le \sigma_{j+1},

2039: \end{equation}

2040: %

2041: and

2042: %

2043: \begin{equation}

2044: \label{singular_value_bound3}

2045: \| \Theta \| \le \sigma_{k+1}.

2046: \end{equation}

2047: %

2048: Combining~(\ref{simplification12})--(\ref{singular_value_bound3})

2049: and the fact that the columns of $U$ are orthonormal,

2050: as are the columns of $V$, yields

2051: %

2052: \begin{multline}

2053: \label{almost_there2}

2054: \| F \, G \, (A \, A^\T)^i \, A - A \|^2

2055: \le \left( \left\| H^{(-1)} \right\|^2 \, \| R \|^2

2056:         \, \left( \frac{\sigma_{j+1}}{\sigma_j} \right)^{4i} + 1 \right)

2057:  \, (\sigma_{j+1})^2 \\

2058:   + \left( \left\| H^{(-1)} \right\|^2 \, \| \Gamma \|^2

2059:         \, \left( \frac{\sigma_{k+1}}{\sigma_j} \right)^{4i} + 1 \right)

2060:  \, (\sigma_{k+1})^2.

2061: \end{multline}

2062:

2063: Combining Lemma~\ref{greatest_value}

2064: and the fact that the entries of $R$ are

2065: i.i.d.\ Gaussian random variables of zero mean and unit variance,

2066: as are the entries of $\Gamma$, yields

2067: %

2068: \begin{equation}

2069: \label{residual2}

2070: \| R \| \le \sqrt{2l} \; \gamma

2071: \end{equation}

2072: %

2073: and

2074: %

2075: \begin{equation}

2076: \label{residual3}

2077: \| \Gamma \| \le \sqrt{2 \, \max(m-k,l)} \; \gamma,

2078: \end{equation}

2079: %

2080: with probability not less than

2081: %

2082: \begin{multline}

2083: 1 - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, \max(m-k,l) \, \gamma^2}}

2084:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^{\max(m-k,\,l)} \\

2085:   - \frac{1}{4 \, (\gamma^2-1) \, \sqrt{\pi \, l \, \gamma^2}}

2086:     \left( \frac{2 \gamma^2}{e^{\gamma^2-1}} \right)^l.

2087: \end{multline}

2088: %

2089: Combining~(\ref{almost_there2}), (\ref{pseudoinverse2}),

2090: (\ref{residual2}), and~(\ref{residual3}) yields

2091: %

2092: \begin{multline}

2093: \label{pre-approximation2}

2094: \| F \, G \, (A \, A^\T)^i \, A - A \|^2

2095: \le \left( 2 l^2 \, \beta^2 \, \gamma^2

2096:         \, \left( \frac{\sigma_{j+1}}{\sigma_j} \right)^{4i} + 1 \right)

2097:  \, (\sigma_{j+1})^2 \\

2098:   + \left( 2 l \, \max(m-k,l) \, \beta^2 \, \gamma^2

2099:         \, \left( \frac{\sigma_{k+1}}{\sigma_j} \right)^{4i} + 1 \right)

2100:  \, (\sigma_{k+1})^2

2101: \end{multline}

2102: %

2103: with probability not less than $\Phi$ defined in~(\ref{probability22}).

2104: Combining~(\ref{pre-approximation2}),

2105: the fact that $\sigma_{j+1} \le \sigma_j$, and the fact that

2106: %

2107: \begin{equation}

2108: \sqrt{x + y} \le \sqrt{x} + \sqrt{y}

2109: \end{equation}

2110: %

2111: for any nonnegative real numbers $x$ and $y$

2112: yields~(\ref{approximation22}).

2113: \end{proof}

2114:

2115:

2116:

2117: \section*{Acknowledgements}

2118: We thank Ming Gu for suggesting the combination

2119: of the Lanczos method with randomized methods

2120: for the low-rank approximation of matrices.

2121: We are grateful for many helpful discussions

2122: with R. Raphael Coifman and Yoel Shkolnisky.

2123: We thank the anonymous referees for their useful suggestions.

2124:

2125:

2126:

2127: \begin{figure}[b]

2128: \begin{center}

2129: %

2130: %

2131: \begin{tabular}{r|r|c|r|r|r}

2132:    $m$ &     $n$ & $i$ &      $t$ & $\sigma_{k+1}$ & $\delta$ \\\hline

2133:                                                                 \hline

2134:    512 &    1024 &   1 & .13E--01 &           .001 &    .0011 \\\hline

2135:   2048 &    4096 &   1 & .56E--01 &           .001 &    .0013 \\\hline

2136:   8192 &   16384 &   1 & .25E--00 &           .001 &    .0018 \\\hline

2137:  32768 &   65536 &   1 &  .12E+01 &           .001 &    .0024 \\\hline

2138: 131072 &  262144 &   1 &  .75E+01 &           .001 &    .0037 \\\hline

2139: 524288 & 1048576 &   1 &  .36E+02 &           .001 &    .0039 \\\hline

2140: \end{tabular}

2141: %

2142: %

2143: \\\vspace{.125in}

2144: %

2145: Table~1: Five-step algorithm of Subsection~\ref{main_algorithm}

2146: %

2147: %

2148: \end{center}

2149: \end{figure}

2150:

2151:

2152:

2153: \begin{figure}

2154: \begin{center}

2155: %

2156: %

2157: \begin{tabular}{r|r|c|r|r|r}

2158:     $m$ &    $n$ & $i$ &      $t$ & $\sigma_{k+1}$ & $\delta$ \\\hline

2159:                                                                 \hline

2160:    512 &    1024 &   0 & .14E--01 &           .001 &     .012 \\\hline

2161:   2048 &    4096 &   0 & .47E--01 &           .001 &     .027 \\\hline

2162:   8192 &   16384 &   0 & .22E--00 &           .001 &     .039 \\\hline

2163:  32768 &   65536 &   0 &  .10E+01 &           .001 &     .053 \\\hline

2164: 131072 &  262144 &   0 &  .60E+01 &           .001 &     .110 \\\hline

2165: 524288 & 1048576 &   0 &  .29E+02 &           .001 &     .220 \\\hline

2166: \end{tabular}

2167: %

2168: %

2169: \\\vspace{.125in}

2170: %

2171: Table~2: Five-step algorithm of Subsection~\ref{main_algorithm}

2172: %

2173: %

2174: \end{center}

2175: \end{figure}

2176:

2177:

2178:

2179: \begin{figure}

2180: \begin{center}

2181: %

2182: %

2183: \begin{tabular}{r|r|c|r|r|r}

2184:    $m$ &     $n$ & $i$ &     $t$ & $\sigma_{k+1}$ & $\delta$ \\\hline

2185:                                                                \hline

2186: 524288 & 1048576 &   0 & .29E+02 &            .01 &     .862 \\\hline

2187: 524288 & 1048576 & (1) & .31E+02 &            .01 &     .091 \\\hline

2188: 524288 & 1048576 &   1 & .36E+02 &            .01 &     .037 \\\hline

2189: 524288 & 1048576 & (2) & .38E+02 &            .01 &     .025 \\\hline

2190: 524288 & 1048576 &   2 & .43E+02 &            .01 &     .022 \\\hline

2191: 524288 & 1048576 & (3) & .45E+02 &            .01 &     .015 \\\hline

2192: 524288 & 1048576 &   3 & .49E+02 &            .01 &     .010 \\\hline

2193: \end{tabular}

2194: %

2195: %

2196: \\\vspace{.125in}

2197: %

2198: Table~3: Five-step algorithms of Subsections~\ref{main_algorithm}

2199:          and~\ref{modified} \\\quad\quad\quad\quad

2200:          (parentheses around $i$ designate Subsection~\ref{modified})

2201: %

2202: %

2203: \end{center}

2204: \end{figure}

2205:

2206:

2207:

2208: \begin{figure}

2209: \begin{center}

2210: %

2211: %

2212: \begin{tabular}{r|r|c|r|r|r}

2213:    $m$ &    $n$ & $i$ &     $t$ & $\sigma_{k+1}$ & $\delta$ \\\hline

2214:                                                               \hline

2215: 262144 & 524288 &   1 & .17E+02 &       .10E--02 & .39E--02 \\\hline

2216: 262144 & 524288 &   1 & .17E+02 &       .10E--04 & .10E--03 \\\hline

2217: 262144 & 524288 &   1 & .17E+02 &       .10E--06 & .25E--05 \\\hline

2218: 262144 & 524288 &   1 & .17E+02 &       .10E--08 & .90E--06 \\\hline

2219: 262144 & 524288 &   1 & .17E+02 &       .10E--10 & .55E--07 \\\hline

2220: 262144 & 524288 &   1 & .17E+02 &       .10E--12 & .51E--08 \\\hline

2221: 262144 & 524288 &   1 & .17E+02 &       .10E--14 & .10E--05 \\\hline

2222: \end{tabular}

2223: %

2224: %

2225: \\\vspace{.125in}

2226: %

2227: Table~4: Five-step algorithm of Subsection~\ref{main_algorithm}

2228: %

2229: %

2230: \end{center}

2231: \end{figure}

2232:

2233:

2234:

2235: \begin{figure}

2236: \begin{center}

2237: %

2238: %

2239: \begin{tabular}{r|r|c|r|r|r}

2240:    $m$ &    $n$ & $i$ &     $t$ & $\sigma_{k+1}$ &   $\delta$ \\\hline

2241:                                                                 \hline

2242: 262144 & 524288 &   1 & .31E+02 &       .10E--02 &   .35E--02 \\\hline

2243: 262144 & 524288 &   1 & .31E+02 &       .10E--04 &   .15E--04 \\\hline

2244: 262144 & 524288 &   1 & .31E+02 &       .10E--06 &   .24E--05 \\\hline

2245: 262144 & 524288 &   1 & .31E+02 &       .10E--08 &   .11E--06 \\\hline

2246: 262144 & 524288 &   1 & .31E+02 &       .10E--10 &   .19E--08 \\\hline

2247: 262144 & 524288 &   1 & .31E+02 &       .10E--12 &   .25E--10 \\\hline

2248: 262144 & 524288 &   1 & .31E+02 &       .10E--14 &   .53E--11 \\\hline

2249: \end{tabular}

2250: %

2251: %

2252: \\\vspace{.125in}

2253: %

2254: Table~5: Five-step algorithm of Subsection~\ref{blanczos}

2255: %

2256: %

2257: \end{center}

2258: \end{figure}

2259:

2260:

2261:

2262: \begin{figure}

2263: \begin{center}

2264: %

2265: %

2266: \begin{tabular}{r|r|r|r|r}

2267:  $m$ &  $n$ &      $t$ & $\sigma_{k+1}$ & $\delta$ \\\hline

2268:                                                      \hline

2269:  512 & 1024 & .60E--01 &           .001 &    .0047 \\\hline

2270: 1024 & 2048 & .29E--00 &           .001 &    .0065 \\\hline

2271: 2048 & 4096 &  .11E+01 &           .001 &    .0092 \\\hline

2272: 4096 & 8192 &  .43E+01 &           .001 &    .0131 \\\hline

2273: \end{tabular}

2274: %

2275: %

2276: \\\vspace{.125in}

2277: %

2278: Table~6: Pivoted $QR$-decomposition

2279: %

2280: %

2281: \end{center}

2282: \end{figure}

2283:

2284:

2285:

2286: \begin{figure}

2287: \begin{center}

2288: %

2289: %

2290: \rotatebox{-90}{\scalebox{.28}{\includegraphics{plot}}}

2291: %

2292: %

2293: \\\vspace{.15in}

2294: %

2295: Figure~1: Singular values with $m = 512$, $n = 1024$, \\

2296:           and $\sigma_{k+1} = .001$

2297: %

2298: %

2299: \end{center}

2300: \end{figure}

2301:

2302:

2303:

2304: \clearpage

2305:

2306:

2307: \bibliographystyle{siam}

2308: \bibliography{pca}

2309:

2310:

2311: \end{document}

2312: