fc1e817f8cd113a9.tex
1: \begin{abstract}
2:   Dynamic programming and heuristic search are at the core of
3:   state-of-the-art solvers for sequential decision-making problems.
4: In partially observable or collaborative settings (\eg, POMDPs and
5:   Dec-POMDPs), this requires introducing an appropriate statistic that
6:   induces a fully observable problem as well as bounding (convex)
7:   approximators of the optimal value function.
8: This approach has succeeded in some subclasses of 2-player zero-sum
9:   partially observable stochastic games (zs-POSGs) as well, but failed
10:   in the general case despite known concavity and convexity
11:   properties, which only led to heuristic algorithms with poor convergence
12:   guarantees.
13: We overcome this issue, leveraging on these properties to derive
14:   bounding approximators and efficient update and selection operators,
15: before deriving a prototypical solver inspired by HSVI that
16:   provably converges to an $\epsilon$-optimal solution in finite time,
17:   and which we empirically evaluate.
18: This opens the door to a novel family of promising approaches
19:   complementing those relying on linear programming or iterative
20:   methods.
21: \end{abstract}
22: