6d9ae742eccd05d4.tex
1: \begin{abstract}
2:  Although the methods of  bagging and random forests are some of the most widely used prediction methods, relatively little is known about their algorithmic convergence. 
3:  In particular, there are not many theoretical guarantees for deciding when an ensemble  is ``large enough'' 
4:  --- so that its accuracy is close to that of an ideal infinite ensemble.
5:  %
6:   Due to the fact that bagging and random forests are randomized algorithms, the choice of  ensemble size is closely related to the notion of ``algorithmic variance'' (i.e.~the variance of prediction error due only to the training algorithm). In the present work, we propose a bootstrap method to estimate this variance for bagging, random forests, and related methods in the context of classification.
7: %
8: %
9: To be specific, suppose the training dataset is fixed, and let the random variable $\Err_t$ denote the prediction error of a randomized ensemble of size $t$.
10: Working under a ``first-order model'' for randomized ensembles, we prove that the centered law of $\Err_t$ can be consistently approximated via the proposed method as $t\to\infty$. Meanwhile, the computational cost of the method is quite modest, by virtue of an extrapolation technique.  As a consequence, the method offers a practical guideline for deciding when the algorithmic fluctuations of $\Err_t$ are negligible.
11:  
12: \end{abstract}
13: