ee2aa3b5aad07d36.tex
1: \begin{abstract}
2:   %
3:   Reinforcement learning (RL) is a popular approach for robotic path planning 
4:   in uncertain environments. However, the control policies trained for an
5:   RL agent crucially depend on user-defined, state-based reward functions. 
6:   Poorly designed rewards can lead to policies that do get maximal rewards
7:   but fail to satisfy desired task objectives or are unsafe. There
8:   are several examples of the use of formal language such as 
9:   temporal logics and automata to specify high-level task specifications
10:   for robots  (in lieu of Markovian rewards). Recent efforts have focused 
11:   on inferring state-based rewards
12:   from formal specifications; here, the goal is to provide (probabilistic) 
13:   guarantees that the policy learned using RL (with the inferred rewards) 
14:   satisfies the high-level formal specification. A key drawback of several
15:   of these techniques is that the rewards that they infer are sparse:
16:   the agent receives positive rewards only upon completion of the task
17:   and no rewards otherwise. This naturally leads to poor convergence 
18:   properties and high variance during RL. 
19:  
20:   In this work we propose using formal specifications in the form of symbolic 
21:   automata: these serve as a generalization of both bounded-time temporal
22:   logic based specifications as well as automata. Furthermore our 
23:   use of symbolic automata allows us to define non-sparse potential-based
24:   rewards which empirically shape the reward surface, leading to better
25:   convergence during RL. We also show that our potential-based rewarding
26:   strategy still allows us to obtain the policy that maximizes the satisfaction
27:   of the given specification.
28:   
29: \end{abstract}
30: