abstract:ee2aa3b5aad07d36.tex

1: \begin{abstract}

2:   %

3:   Reinforcement learning (RL) is a popular approach for robotic path planning

4:   in uncertain environments. However, the control policies trained for an

5:   RL agent crucially depend on user-defined, state-based reward functions.

6:   Poorly designed rewards can lead to policies that do get maximal rewards

7:   but fail to satisfy desired task objectives or are unsafe. There

8:   are several examples of the use of formal language such as

9:   temporal logics and automata to specify high-level task specifications

10:   for robots  (in lieu of Markovian rewards). Recent efforts have focused

11:   on inferring state-based rewards

12:   from formal specifications; here, the goal is to provide (probabilistic)

13:   guarantees that the policy learned using RL (with the inferred rewards)

14:   satisfies the high-level formal specification. A key drawback of several

15:   of these techniques is that the rewards that they infer are sparse:

16:   the agent receives positive rewards only upon completion of the task

17:   and no rewards otherwise. This naturally leads to poor convergence

18:   properties and high variance during RL.

19:

20:   In this work we propose using formal specifications in the form of symbolic

21:   automata: these serve as a generalization of both bounded-time temporal

22:   logic based specifications as well as automata. Furthermore our

23:   use of symbolic automata allows us to define non-sparse potential-based

24:   rewards which empirically shape the reward surface, leading to better

25:   convergence during RL. We also show that our potential-based rewarding

26:   strategy still allows us to obtain the policy that maximizes the satisfaction

27:   of the given specification.

28:

29: \end{abstract}

30: