abstract:0e79fa928a55d935.tex

1: \begin{abstract}

2: Due to recent empirical successes, the options framework for hierarchical reinforcement learning is gaining increasing popularity. Rather than learning from rewards, we consider learning an options-type hierarchical policy from expert demonstrations. Such a problem is referred to as \emph{hierarchical imitation learning}. Converting this problem to parameter inference in a latent variable model, we develop convergence guarantees for the EM approach proposed by \citep{daniel2016probabilistic}. The population level algorithm is analyzed as an intermediate step, which is nontrivial due to the samples being correlated. If the expert policy can be parameterized by a variant of the options framework, then, under regularity conditions, we prove that the proposed algorithm converges with high probability to a norm ball around the true parameter. To our knowledge, this is the first performance guarantee for an hierarchical imitation learning algorithm that only observes primitive state-action pairs.

3: \end{abstract}

4: