1feb6cbd654b9c3c.tex
1: \begin{abstract}
2: Deep reinforcement learning (RL) uses model-free techniques to optimize
3: task-specific control policies. Despite having emerged as a promising approach
4: for complex problems, RL is still hard to use reliably for real-world
5: applications. Apart from challenges such as precise reward function tuning,
6: inaccurate sensing and actuation, and non-deterministic response, existing RL
7: methods do not guarantee behavior within required safety constraints that are
8: crucial for real robot scenarios. In this regard, we introduce guided
9: constrained policy optimization (GCPO), an RL framework based upon our
10: implementation of constrained proximal policy optimization (CPPO) for tracking
11: base velocity commands while following the defined constraints. We introduce
12: schemes which encourage state recovery into constrained regions in case of
13: constraint violations. We present experimental results of our training method
14: and test it on the real ANYmal quadruped robot. We compare our approach
15: against the unconstrained RL method and show that guided constrained RL offers
16: faster convergence close to the desired optimum resulting in an optimal, yet
17: physically feasible, robotic control behavior without the need for precise
18: reward function tuning.
19: 
20: \end{abstract}
21: