abstract:b34d019758a5c3f4.tex

1: \begin{abstract}

2:   Factor modeling is an essential tool for exploring intrinsic dependence structures among high-dimensional

3:   random variables. Much progress has been made for estimating the covariance matrix from a high-dimensional

4:   factor model. However, the blessing of dimensionality has not yet been fully embraced in the literature:

5:   much of the available data is often ignored in constructing covariance matrix estimates.  If our goal is to

6:   accurately estimate a covariance matrix of a set of targeted variables, shall we employ additional data,

7:   which are beyond the variables of interest, in the estimation? In this paper, we provide sufficient

8:   conditions for an affirmative answer, and further quantify its gain in terms of Fisher information and

9:   convergence rate. In fact, even an oracle-like result (as if all the factors were known) can be achieved

10:   when a sufficiently large number of variables is used. The idea of utilizing data as much as possible brings

11:   computational challenges. A divide-and-conquer algorithm is thus proposed to alleviate the computational

12:   burden, and also shown not to sacrifice any statistical accuracy in comparison with a pooled

13:   analysis. Simulation studies further confirm our advocacy for the use of full data, and demonstrate the

14:   effectiveness of the above algorithm. Our proposal is applied to a microarray data example that shows

15:   empirical benefits of using more data.

16: \end{abstract}

17: