bce926452cf856d3.tex
1: \begin{abstract}
2: We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. 
3: We propose a statistical model for such data, derived from the
4: structure of a Markov decision process. Adopting a Bayesian approach to
5: inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified
6: framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler.  As an illustration, the method is applied to learning a human controller.
7: 
8: \end{abstract}
9: