1: \begin{abstract}
2:
3: Gradient boosting performs exceptionally in most prediction problems and scales well to large datasets.
4: In this paper we prove that a ``lassoed'' gradient boosted tree algorithm with early stopping achieves faster than $n^{-1/4}$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation.
5: This rate is remarkable because it does not depend on the dimension, sparsity, or smoothness.
6: We use simulation and real data to confirm our theory and demonstrate empirical performance and scalability on par with standard boosting.
7: Our convergence proofs are based on a novel, general theorem on early stopping with empirical loss minimizers of nested Donsker classes.
8: \end{abstract}
9: