1: \begin{abstract}
2: %
3: Reinforcement learning (RL) is a popular approach for robotic path planning
4: in uncertain environments. However, the control policies trained for an
5: RL agent crucially depend on user-defined, state-based reward functions.
6: Poorly designed rewards can lead to policies that do get maximal rewards
7: but fail to satisfy desired task objectives or are unsafe. There
8: are several examples of the use of formal language such as
9: temporal logics and automata to specify high-level task specifications
10: for robots (in lieu of Markovian rewards). Recent efforts have focused
11: on inferring state-based rewards
12: from formal specifications; here, the goal is to provide (probabilistic)
13: guarantees that the policy learned using RL (with the inferred rewards)
14: satisfies the high-level formal specification. A key drawback of several
15: of these techniques is that the rewards that they infer are sparse:
16: the agent receives positive rewards only upon completion of the task
17: and no rewards otherwise. This naturally leads to poor convergence
18: properties and high variance during RL.
19:
20: In this work we propose using formal specifications in the form of symbolic
21: automata: these serve as a generalization of both bounded-time temporal
22: logic based specifications as well as automata. Furthermore our
23: use of symbolic automata allows us to define non-sparse potential-based
24: rewards which empirically shape the reward surface, leading to better
25: convergence during RL. We also show that our potential-based rewarding
26: strategy still allows us to obtain the policy that maximizes the satisfaction
27: of the given specification.
28:
29: \end{abstract}
30: