1: \begin{abstract}
2: Ensemble learning is a statistical paradigm
3: %in which multiple learners are trained to solve the same problem.
4: %Its success is
5: built on the premise that many weak learners can perform exceptionally well when deployed collectively.
6: %These methods have had a decided impact on
7: %The success of ensemble learning can be attributed to the fact that many weak learners can perform exceptionally well when deployed collectively.
8: The BART method of \cite{chipman2010bart} is a prominent example of {\sl Bayesian} ensemble learning, where each learner is a tree.
9: Due to its impressive performance, BART has received a lot of attention from practitioners.
10: Despite its wide popularity, however, theoretical studies of BART have begun emerging only very recently.
11: Laying the foundations for the theoretical analysis of Bayesian forests, \cite{rockova2017posterior} showed optimal posterior concentration under {\sl conditionally uniform tree priors.}
12: These priors deviate from the actual priors implemented in BART. Here, we study the exact BART prior and propose a simple modification so that it {\sl also} enjoys optimality properties.
13: To this end, we dive into branching process theory. We obtain tail bounds for the distribution of total progeny under heterogeneous Galton-Watson (GW) processes exploiting their connection to random walks.
14: We conclude with a result stating the optimal rate of posterior convergence for BART.
15: \end{abstract}
16: