abstract:c06db997e85d71a6.tex

1: \begin{abstract}

2: Reinforcement Learning (RL) struggles in problems with delayed rewards, and one approach is to segment the task into sub-tasks with incremental rewards.

3: We propose a framework called \hirlfull (\hirl), which is a model for learning sub-task structure from demonstrations.

4: \hirl decomposes the task into sub-tasks based on transitions that are consistent across demonstrations. These transitions are defined as changes in local linearity w.r.t to a kernel function~\cite{krishnan2015tsc}.

5: Then, \hirl uses the inferred structure to learn reward functions local to the sub-tasks but also handle any global dependencies such as sequentiality.

6:

7: We have evaluated \hirl on several standard RL benchmarks: Parallel Parking with noisy dynamics, Two-Link Pendulum, 2D Noisy Motion Planning, and a Pinball environment.

8: In the parallel parking task, we find that rewards constructed with \hirl converge to a policy with an 80\% success rate in 32\% fewer time-steps than those constructed with Maximum Entropy Inverse RL (MaxEnt IRL), and with partial state observation, the policies learned with IRL fail to achieve this accuracy while \hirl still converges.

9: We further find that that the rewards learned with \hirl are robust to environment noise where they can tolerate 1 stdev. of random perturbation in the poses in the environment obstacles while maintaining roughly the same convergence rate.

10: We find that \hirl rewards can converge up-to $6\times$ faster than rewards constructed with IRL.

11: \end{abstract}