1: \begin{abstract}
2: We study the convergence of the predictive surface of regression trees and forests.
3: To support our analysis we introduce a notion of adaptive concentration for regression trees.
4: This approach breaks tree training into a model selection phase in which we pick the tree splits, followed by a model fitting phase where we find the best regression model consistent with these splits.
5: We then show that the fitted regression tree concentrates around the optimal predictor with the same splits:
6: as $d$ and $n$ get large, the discrepancy is with high probability bounded on the order of $\sqrt{\log(d)\log(n)/k}$ uniformly over the whole regression surface, where $d$ is the dimension of the feature space, $n$ is the number of training examples, and $k$ is the minimum leaf size for each tree.
7: We also provide rate-matching lower bounds for this adaptive concentration statement.
8: From a practical perspective, our result enables us to prove consistency results for adaptively grown forests in high dimensions,
9: and to carry out valid post-selection inference in the sense of Berk et al. [2013] for subgroups defined by tree leaves.
10: \end{abstract}
11: