abstract:21ba94e8c2b2cdb8.tex

1: \begin{abstract}

2: The bootstrap provides a simple and powerful means of assessing the quality of

3: estimators.  However, in settings involving large datasets---which are increasingly

4: prevalent---the computation of bootstrap-based quantities can be prohibitively

5: demanding computationally.  While variants such as subsampling and the $m$ out

6: of $n$ bootstrap can be used in principle to reduce the cost of bootstrap

7: computations, we find that these methods are generally not robust to specification

8: of hyperparameters (such as the number of subsampled data points), and they often

9: require use of more prior information (such as rates of convergence of estimators)

10: than the bootstrap.  As an alternative, we introduce \ouralgWithAbbrev, a new

11: procedure which incorporates features of both the bootstrap and subsampling

12: to yield a robust, computationally efficient means of assessing the quality of estimators.  \OuralgAbbrev is well suited to modern parallel and distributed computing architectures

13: and furthermore retains the generic applicability and statistical efficiency of

14: the bootstrap.  We demonstrate \ouralgAbbrev's favorable statistical performance

15: via a theoretical analysis elucidating the procedure's properties, as well as a

16: simulation study comparing \ouralgAbbrev to the bootstrap, the $m$ out of $n$

17: bootstrap, and subsampling.  In addition, we present results from a large-scale

18: distributed implementation of \ouralgAbbrev demonstrating its computational

19: superiority on massive data, a method for adaptively selecting \ouralgAbbrev's

20: hyperparameters, an empirical study applying \ouralgAbbrev to several real datasets,

21: and an extension of \ouralgAbbrev to time series data.

22: \end{abstract}

23: