abstract:3c817da6c804e765.tex

1: \begin{abstract}

2: The divide and conquer method is a common strategy for handling massive data. In this article, we study the divide and conquer method for cubic-rate estimators under the massive data framework. We develop a general theory for establishing the asymptotic distribution of the aggregated M-estimators using a simple average. Under certain condition on the growing rate of the number of subgroups, the resulting aggregated estimators are shown to have faster convergence rate and asymptotic normal distribution, which are more tractable in both computation and inference than the original M-estimators based on pooled data.

3: %Instead of putting all the data together to get the ``pooled" estimator, we propose to randomly divide the data into several populations, compute the estimator for each subgroup and then aggregate all estimators using a simple average. We develop a general central limit theory states that unlike the ``pooled" estimator, our proposed estimator converges much faster and has an attractive asymptotic normal distribution. This is in contrast to most aggregation type estimators proposed in the divide and conquer literature, which are shown to perform only comparable with respect to the ``pooled" estimator. Such  asymptotic result holds when the number of populations doesn't grow too fast. We also propose a simple estimator for evaluating the asymptotic variance-covariance matrix of our estimator.

4: Our theory applies to a wide class of M-estimators with cube root convergence rate, including the location estimator, maximum score estimator and value search estimator. Empirical performance via simulations also validate our theoretical findings.

5: \end{abstract}