b34d019758a5c3f4.tex
1: \begin{abstract}
2:   Factor modeling is an essential tool for exploring intrinsic dependence structures among high-dimensional
3:   random variables. Much progress has been made for estimating the covariance matrix from a high-dimensional
4:   factor model. However, the blessing of dimensionality has not yet been fully embraced in the literature:
5:   much of the available data is often ignored in constructing covariance matrix estimates.  If our goal is to
6:   accurately estimate a covariance matrix of a set of targeted variables, shall we employ additional data,
7:   which are beyond the variables of interest, in the estimation? In this paper, we provide sufficient
8:   conditions for an affirmative answer, and further quantify its gain in terms of Fisher information and
9:   convergence rate. In fact, even an oracle-like result (as if all the factors were known) can be achieved
10:   when a sufficiently large number of variables is used. The idea of utilizing data as much as possible brings
11:   computational challenges. A divide-and-conquer algorithm is thus proposed to alleviate the computational
12:   burden, and also shown not to sacrifice any statistical accuracy in comparison with a pooled
13:   analysis. Simulation studies further confirm our advocacy for the use of full data, and demonstrate the
14:   effectiveness of the above algorithm. Our proposal is applied to a microarray data example that shows
15:   empirical benefits of using more data.
16: \end{abstract}
17: