d09faa401d8e4ec9.tex
1: \begin{abstract}
2: 
3: The main challenge in developing effective reinforcement learning (RL) pipelines is often the design and tuning the reward functions.
4: Well-designed shaping reward can lead to significantly faster learning.
5: Naively formulated rewards, however, can conflict with the desired behavior and result in overfitting or even erratic performance if not properly tuned.
6: In theory, the broad class of \emph{potential based reward shaping} (PBRS) can help guide the learning process without affecting the optimal policy.
7: Although several studies have explored the use of potential based reward shaping to accelerate learning convergence, most have been limited to grid-worlds and low-dimensional systems, and RL in robotics has predominantly relied on standard forms of reward shaping.
8: \par
9: In this paper, we benchmark standard forms of shaping with PBRS for a humanoid robot. 
10: We find that in this high-dimensional system, PBRS has only marginal benefits in convergence speed.
11: However, the PBRS reward terms are significantly more robust to scaling than typical reward shaping approaches, and thus easier to tune.
12: % Qualitatively, policies trained with standard reward shaping terms are more prone to overfit to the shaping reward terms, while those trained with the PBRS terms are less affected.
13: 
14: \end{abstract}
15: