abstract:fc1e817f8cd113a9.tex

1: \begin{abstract}

2:   Dynamic programming and heuristic search are at the core of

3:   state-of-the-art solvers for sequential decision-making problems.

4: In partially observable or collaborative settings (\eg, POMDPs and

5:   Dec-POMDPs), this requires introducing an appropriate statistic that

6:   induces a fully observable problem as well as bounding (convex)

7:   approximators of the optimal value function.

8: This approach has succeeded in some subclasses of 2-player zero-sum

9:   partially observable stochastic games (zs-POSGs) as well, but failed

10:   in the general case despite known concavity and convexity

11:   properties, which only led to heuristic algorithms with poor convergence

12:   guarantees.

13: We overcome this issue, leveraging on these properties to derive

14:   bounding approximators and efficient update and selection operators,

15: before deriving a prototypical solver inspired by HSVI that

16:   provably converges to an $\epsilon$-optimal solution in finite time,

17:   and which we empirically evaluate.

18: This opens the door to a novel family of promising approaches

19:   complementing those relying on linear programming or iterative

20:   methods.

21: \end{abstract}

22: