abstract:a6792f836686e8f6.tex

1: \begin{abstract}

2:   Decentralized partially observable Markov decision processes (Dec-POMDPs) formalize the problem of designing individual controllers for a group of collaborative agents under stochastic dynamics and partial observability.

3:   %

4:   Seeking a global optimum is difficult (NEXP complete), but seeking a Nash equilibrium ---each agent policy being a best response to the other agents--- is more accessible, and allowed addressing infinite-horizon problems with solutions in the form of finite state controllers.

5:   %

6:   In this paper, we show that this approach can be adapted to cases where only a generative model (a simulator) of the Dec-POMDP is available.

7:   %

8:   This requires relying on a simulation-based POMDP solver to construct an agent's FSC node by node.

9:   %

10:   A related process is used to heuristically derive initial FSCs. % that help converge to better Nash equilibria.

11:   %

12:   Experiment with benchmarks shows that MC-JESP is competitive with exisiting Dec-POMDP solvers, even better than many offline methods using explicit models.

13: \end{abstract}

14: