abstract:6a5c7ca701f7246a.tex

1: \begin{abstract}

2:     Sample-efficient exploration is crucial not only for discovering rewarding experiences

3:     but also for adapting to environment changes in a task-agnostic fashion.

4:     A principled treatment of the problem of optimal input synthesis for system identification

5:     is provided within the framework of sequential Bayesian experimental design.

6:     In this paper, we present an effective trajectory-optimization-based

7:     approximate solution of this otherwise intractable problem

8:     that models optimal exploration in an unknown Markov decision process (MDP).

9:     By interleaving episodic exploration with Bayesian nonlinear system identification,

10:     our algorithm takes advantage of the inductive bias to explore in a directed manner,

11:     without assuming prior knowledge of the MDP.

12: 	Empirical evaluations indicate a clear advantage of the proposed algorithm

13: 	in terms of the rate of convergence and the final model fidelity

14:     when compared to intrinsic-motivation-based algorithms

15:     employing exploration bonuses such as prediction error and information gain.

16:     Moreover, our method maintains a computational advantage

17:     over a recent model-based active exploration (MAX) algorithm,

18: 	by focusing on the information gain along trajectories

19:     instead of seeking a global exploration policy.

20:     A reference implementation of our algorithm and the conducted experiments

21:     is publicly available\footnote{\url{https://github.com/mschulth/rhc}}.

22: \end{abstract}

23: