61de55d8159d1624.tex
1: \begin{abstract}
2:   In this article, we discuss how to solve information-gathering problems
3:   expressed as $\rho$-POMDPs, an extension of Partially Observable
4:   Markov Decision Processes (POMDPs) whose reward $\rho$
5:   depends on the belief state.
6:   Point-based approaches used for solving POMDPs have been extended to
7:   solving $\rho$-POMDPs as belief MDPs when its reward $\rho$ is
8:   convex in $\cB$ or when it is Lipschitz-continuous.
9:   In the present paper, we build on the POMCP algorithm to
10:   propose a Monte Carlo Tree Search for $\rho$-POMDPs, aiming for an
11:   efficient on-line planner which can be used for any $\rho$ function.
12:   Adaptations are required due to the belief-dependent rewards to (i)
13:   propagate more than one state at a time, and (ii) prevent biases in
14:   value estimates.
15:   An asymptotic convergence proof to $\epsilon$-optimal values is given when $\rho$ is continuous.
16:   Experiments are conducted to analyze the algorithms at hand and show
17:   that they outperform myopic approaches.
18: \end{abstract}
19: