1: \begin{abstract}
2: In this article, we discuss how to solve information-gathering problems
3: expressed as $\rho$-POMDPs, an extension of Partially Observable
4: Markov Decision Processes (POMDPs) whose reward $\rho$
5: depends on the belief state.
6: Point-based approaches used for solving POMDPs have been extended to
7: solving $\rho$-POMDPs as belief MDPs when its reward $\rho$ is
8: convex in $\cB$ or when it is Lipschitz-continuous.
9: In the present paper, we build on the POMCP algorithm to
10: propose a Monte Carlo Tree Search for $\rho$-POMDPs, aiming for an
11: efficient on-line planner which can be used for any $\rho$ function.
12: Adaptations are required due to the belief-dependent rewards to (i)
13: propagate more than one state at a time, and (ii) prevent biases in
14: value estimates.
15: An asymptotic convergence proof to $\epsilon$-optimal values is given when $\rho$ is continuous.
16: Experiments are conducted to analyze the algorithms at hand and show
17: that they outperform myopic approaches.
18: \end{abstract}
19: