abstract:1feb6cbd654b9c3c.tex

1: \begin{abstract}

2: Deep reinforcement learning (RL) uses model-free techniques to optimize

3: task-specific control policies. Despite having emerged as a promising approach

4: for complex problems, RL is still hard to use reliably for real-world

5: applications. Apart from challenges such as precise reward function tuning,

6: inaccurate sensing and actuation, and non-deterministic response, existing RL

7: methods do not guarantee behavior within required safety constraints that are

8: crucial for real robot scenarios. In this regard, we introduce guided

9: constrained policy optimization (GCPO), an RL framework based upon our

10: implementation of constrained proximal policy optimization (CPPO) for tracking

11: base velocity commands while following the defined constraints. We introduce

12: schemes which encourage state recovery into constrained regions in case of

13: constraint violations. We present experimental results of our training method

14: and test it on the real ANYmal quadruped robot. We compare our approach

15: against the unconstrained RL method and show that guided constrained RL offers

16: faster convergence close to the desired optimum resulting in an optimal, yet

17: physically feasible, robotic control behavior without the need for precise

18: reward function tuning.

19:

20: \end{abstract}

21: