abstract:18124b03692d2f84.tex

1: \begin{abstract}

2: %

3: \lcrl is a software tool that implements model-free Reinforcement Learning (RL) algorithms

4: over unknown Markov Decision Processes (MDPs),

5: synthesising policies that satisfy a given linear temporal specification with maximal probability.

6: \lcrl leverages partially deterministic finite-state machines known as Limit Deterministic B\"uchi Automata (LDBA)

7: to express a given linear temporal specification.

8: A reward function for the RL algorithm is shaped on-the-fly, based on the structure of the LDBA.

9: Theoretical guarantees under proper assumptions ensure the

10: convergence of the RL algorithm to an optimal policy that maximises the satisfaction probability.

11: We present case studies to

12: demonstrate the applicability, ease of use, scalability and performance of \lcrl~. Owing to the LDBA-guided exploration

13: and \lcrl model-free architecture, we observe robust performance, which also scales well when compared to standard RL approaches (whenever applicable to LTL specifications). Full instructions on how to execute all the case studies in this

14: paper are provided on a GitHub page that accompanies the \lcrl

15: distribution \url{www.github.com/grockious/lcrl}.

16:

17: \end{abstract}

18: