abstract:61de55d8159d1624.tex

1: \begin{abstract}

2:   In this article, we discuss how to solve information-gathering problems

3:   expressed as $\rho$-POMDPs, an extension of Partially Observable

4:   Markov Decision Processes (POMDPs) whose reward $\rho$

5:   depends on the belief state.

6:   Point-based approaches used for solving POMDPs have been extended to

7:   solving $\rho$-POMDPs as belief MDPs when its reward $\rho$ is

8:   convex in $\cB$ or when it is Lipschitz-continuous.

9:   In the present paper, we build on the POMCP algorithm to

10:   propose a Monte Carlo Tree Search for $\rho$-POMDPs, aiming for an

11:   efficient on-line planner which can be used for any $\rho$ function.

12:   Adaptations are required due to the belief-dependent rewards to (i)

13:   propagate more than one state at a time, and (ii) prevent biases in

14:   value estimates.

15:   An asymptotic convergence proof to $\epsilon$-optimal values is given when $\rho$ is continuous.

16:   Experiments are conducted to analyze the algorithms at hand and show

17:   that they outperform myopic approaches.

18: \end{abstract}

19: