a6792f836686e8f6.tex
1: \begin{abstract}
2:   Decentralized partially observable Markov decision processes (Dec-POMDPs) formalize the problem of designing individual controllers for a group of collaborative agents under stochastic dynamics and partial observability.
3:   %
4:   Seeking a global optimum is difficult (NEXP complete), but seeking a Nash equilibrium ---each agent policy being a best response to the other agents--- is more accessible, and allowed addressing infinite-horizon problems with solutions in the form of finite state controllers.
5:   %
6:   In this paper, we show that this approach can be adapted to cases where only a generative model (a simulator) of the Dec-POMDP is available.
7:   %
8:   This requires relying on a simulation-based POMDP solver to construct an agent's FSC node by node.
9:   %
10:   A related process is used to heuristically derive initial FSCs. % that help converge to better Nash equilibria.
11:   %
12:   Experiment with benchmarks shows that MC-JESP is competitive with exisiting Dec-POMDP solvers, even better than many offline methods using explicit models.
13: \end{abstract}
14: