abstract:a50fa102942536f7.tex

1: \begin{abstract}

2:     In this work, we generalize the problem of learning through interaction in a POMDP by accounting for eventual additional information available at training time.

3:     First, we introduce the informed POMDP, a new learning paradigm offering a clear distinction between the training information and the execution observation.

4:     Next, we propose an objective for learning a sufficient statistic from the history for the optimal control that leverages this information.

5:     We then show that this informed objective consists of learning an environment model from which we can sample latent trajectories.

6:     Finally, we show for the Dreamer algorithm that the convergence speed of the policies is sometimes greatly improved on several environments by using this informed environment model.

7:     Those results and the simplicity of the proposed adaptation advocate for a systematic consideration of eventual additional information when learning in a POMDP using model-based RL.

8: \end{abstract}

9: