7c6bba60f301f50b.tex
1: \begin{abstract}
2: Ensemble learning is a statistical paradigm 
3: %in which multiple learners are trained to solve the same problem.
4: %Its success is 
5: built on the premise that  many weak learners can perform exceptionally well when deployed collectively.
6: %These methods have had a decided impact on 
7: %The success of ensemble learning can be attributed  to the fact that  many weak learners can perform exceptionally well when deployed collectively.
8: The BART method of \cite{chipman2010bart} is a prominent example of {\sl Bayesian} ensemble learning, where each learner is a tree.
9: Due to its impressive performance, BART has received a lot of attention from practitioners.
10: Despite its wide popularity, however, theoretical studies of BART have  begun emerging only very recently.
11: Laying the foundations for the theoretical analysis of Bayesian forests,  \cite{rockova2017posterior} showed optimal posterior concentration   under {\sl conditionally uniform tree priors.}
12: These priors  deviate from the actual priors implemented in BART. Here, we study the exact BART prior and propose a simple modification so that  it {\sl also} enjoys optimality properties.
13: To this end, we dive into branching process theory. We obtain  tail bounds for the distribution of total progeny under heterogeneous Galton-Watson (GW) processes exploiting their connection to random walks.
14: We conclude with a result stating the optimal rate of posterior convergence for BART.
15: \end{abstract}
16: