abstract:d09faa401d8e4ec9.tex

1: \begin{abstract}

2:

3: The main challenge in developing effective reinforcement learning (RL) pipelines is often the design and tuning the reward functions.

4: Well-designed shaping reward can lead to significantly faster learning.

5: Naively formulated rewards, however, can conflict with the desired behavior and result in overfitting or even erratic performance if not properly tuned.

6: In theory, the broad class of \emph{potential based reward shaping} (PBRS) can help guide the learning process without affecting the optimal policy.

7: Although several studies have explored the use of potential based reward shaping to accelerate learning convergence, most have been limited to grid-worlds and low-dimensional systems, and RL in robotics has predominantly relied on standard forms of reward shaping.

8: \par

9: In this paper, we benchmark standard forms of shaping with PBRS for a humanoid robot.

10: We find that in this high-dimensional system, PBRS has only marginal benefits in convergence speed.

11: However, the PBRS reward terms are significantly more robust to scaling than typical reward shaping approaches, and thus easier to tune.

12: % Qualitatively, policies trained with standard reward shaping terms are more prone to overfit to the shaping reward terms, while those trained with the PBRS terms are less affected.

13:

14: \end{abstract}

15: