1: \begin{abstract}
2: Dynamic programming and heuristic search are at the core of
3: state-of-the-art solvers for sequential decision-making problems.
4: In partially observable or collaborative settings (\eg, POMDPs and
5: Dec-POMDPs), this requires introducing an appropriate statistic that
6: induces a fully observable problem as well as bounding (convex)
7: approximators of the optimal value function.
8: This approach has succeeded in some subclasses of 2-player zero-sum
9: partially observable stochastic games (zs-POSGs) as well, but failed
10: in the general case despite known concavity and convexity
11: properties, which only led to heuristic algorithms with poor convergence
12: guarantees.
13: We overcome this issue, leveraging on these properties to derive
14: bounding approximators and efficient update and selection operators,
15: before deriving a prototypical solver inspired by HSVI that
16: provably converges to an $\epsilon$-optimal solution in finite time,
17: and which we empirically evaluate.
18: This opens the door to a novel family of promising approaches
19: complementing those relying on linear programming or iterative
20: methods.
21: \end{abstract}
22: