21ba94e8c2b2cdb8.tex
1: \begin{abstract}
2: The bootstrap provides a simple and powerful means of assessing the quality of 
3: estimators.  However, in settings involving large datasets---which are increasingly 
4: prevalent---the computation of bootstrap-based quantities can be prohibitively 
5: demanding computationally.  While variants such as subsampling and the $m$ out 
6: of $n$ bootstrap can be used in principle to reduce the cost of bootstrap 
7: computations, we find that these methods are generally not robust to specification 
8: of hyperparameters (such as the number of subsampled data points), and they often 
9: require use of more prior information (such as rates of convergence of estimators) 
10: than the bootstrap.  As an alternative, we introduce \ouralgWithAbbrev, a new 
11: procedure which incorporates features of both the bootstrap and subsampling 
12: to yield a robust, computationally efficient means of assessing the quality of estimators.  \OuralgAbbrev is well suited to modern parallel and distributed computing architectures 
13: and furthermore retains the generic applicability and statistical efficiency of 
14: the bootstrap.  We demonstrate \ouralgAbbrev's favorable statistical performance 
15: via a theoretical analysis elucidating the procedure's properties, as well as a 
16: simulation study comparing \ouralgAbbrev to the bootstrap, the $m$ out of $n$ 
17: bootstrap, and subsampling.  In addition, we present results from a large-scale 
18: distributed implementation of \ouralgAbbrev demonstrating its computational 
19: superiority on massive data, a method for adaptively selecting \ouralgAbbrev's 
20: hyperparameters, an empirical study applying \ouralgAbbrev to several real datasets,
21: and an extension of \ouralgAbbrev to time series data.
22: \end{abstract}
23: