a50fa102942536f7.tex
1: \begin{abstract}
2:     In this work, we generalize the problem of learning through interaction in a POMDP by accounting for eventual additional information available at training time.
3:     First, we introduce the informed POMDP, a new learning paradigm offering a clear distinction between the training information and the execution observation.
4:     Next, we propose an objective for learning a sufficient statistic from the history for the optimal control that leverages this information.
5:     We then show that this informed objective consists of learning an environment model from which we can sample latent trajectories.
6:     Finally, we show for the Dreamer algorithm that the convergence speed of the policies is sometimes greatly improved on several environments by using this informed environment model.
7:     Those results and the simplicity of the proposed adaptation advocate for a systematic consideration of eventual additional information when learning in a POMDP using model-based RL.
8: \end{abstract}
9: