1: \begin{abstract}
2: In this letter we show how to improve the performance of backward chained behavior trees (BTs) that use reinforcement learning (RL).
3: BTs represent a hierarchical and modular way of combining control policies into higher level control policies. Backward chaining is a design principle for the construction of BTs that combine reactivity with goal directed actions in a structured way.
4: The backward chained structure has also enabled convergence proofs for BTs, identifying a set of local conditions that lead to the convergence of all trajectories to a set of desired goal states.
5:
6: The key idea of this letter is to improve performance of backward chained BTs by
7: using the conditions identified in a theoretical convergence proof to setup the RL problems for individual controllers.
8: In particular, previous analysis identified so-called active constraint conditions (ACCs), that should not be broken in order to avoid having to return to work on previously achieved subgoals.
9: We propose a way to setup the RL problems, such that they do not only achieve each immediate subgoal, but also avoid violating the identified ACCs.
10: The resulting performance improvement depends on how often ACC violations occurred before the change, and how much effort was needed to re-achieve them.
11: The proposed approach is illustrated in a dynamic simulation environment.
12: \end{abstract}
13: