6a5c7ca701f7246a.tex
1: \begin{abstract}
2:     Sample-efficient exploration is crucial not only for discovering rewarding experiences
3:     but also for adapting to environment changes in a task-agnostic fashion.
4:     A principled treatment of the problem of optimal input synthesis for system identification
5:     is provided within the framework of sequential Bayesian experimental design.
6:     In this paper, we present an effective trajectory-optimization-based
7:     approximate solution of this otherwise intractable problem
8:     that models optimal exploration in an unknown Markov decision process (MDP).
9:     By interleaving episodic exploration with Bayesian nonlinear system identification,
10:     our algorithm takes advantage of the inductive bias to explore in a directed manner,
11:     without assuming prior knowledge of the MDP.
12: 	Empirical evaluations indicate a clear advantage of the proposed algorithm
13: 	in terms of the rate of convergence and the final model fidelity
14:     when compared to intrinsic-motivation-based algorithms
15:     employing exploration bonuses such as prediction error and information gain.
16:     Moreover, our method maintains a computational advantage
17:     over a recent model-based active exploration (MAX) algorithm,
18: 	by focusing on the information gain along trajectories
19:     instead of seeking a global exploration policy.
20:     A reference implementation of our algorithm and the conducted experiments
21:     is publicly available\footnote{\url{https://github.com/mschulth/rhc}}.
22: \end{abstract}
23: