abstract:be43e97dc5e89511.tex

1: \begin{abstract}

2: In this letter we show how to improve the performance of backward chained behavior trees (BTs) that use reinforcement learning (RL).

3: BTs represent a hierarchical and modular way of combining control policies into higher level control policies. Backward chaining is a design principle for the construction of BTs that combine reactivity with goal directed actions in a structured way.

4: The backward chained structure has also enabled convergence proofs for BTs, identifying a set of local conditions that lead to the convergence of all trajectories to a set of desired goal states.

5:

6: The key idea of this letter is to improve performance of backward chained BTs by

7: using the conditions identified in a theoretical convergence proof to setup the RL problems for individual controllers.

8: In particular, previous analysis identified so-called active constraint conditions (ACCs), that should not be broken in order to avoid having to return to work on previously achieved subgoals.

9: We propose a way to setup the RL problems, such that they do not only achieve each immediate subgoal, but also avoid violating the identified ACCs.

10: The resulting performance improvement depends on how often ACC violations occurred before the change, and how much effort was needed to re-achieve them.

11: The proposed approach is illustrated in a dynamic simulation environment.

12: \end{abstract}

13: