1: \begin{abstract}
2: Sample-efficient exploration is crucial not only for discovering rewarding experiences
3: but also for adapting to environment changes in a task-agnostic fashion.
4: A principled treatment of the problem of optimal input synthesis for system identification
5: is provided within the framework of sequential Bayesian experimental design.
6: In this paper, we present an effective trajectory-optimization-based
7: approximate solution of this otherwise intractable problem
8: that models optimal exploration in an unknown Markov decision process (MDP).
9: By interleaving episodic exploration with Bayesian nonlinear system identification,
10: our algorithm takes advantage of the inductive bias to explore in a directed manner,
11: without assuming prior knowledge of the MDP.
12: Empirical evaluations indicate a clear advantage of the proposed algorithm
13: in terms of the rate of convergence and the final model fidelity
14: when compared to intrinsic-motivation-based algorithms
15: employing exploration bonuses such as prediction error and information gain.
16: Moreover, our method maintains a computational advantage
17: over a recent model-based active exploration (MAX) algorithm,
18: by focusing on the information gain along trajectories
19: instead of seeking a global exploration policy.
20: A reference implementation of our algorithm and the conducted experiments
21: is publicly available\footnote{\url{https://github.com/mschulth/rhc}}.
22: \end{abstract}
23: