1: \begin{abstract}
2: Decentralized partially observable Markov decision processes (Dec-POMDPs) formalize the problem of designing individual controllers for a group of collaborative agents under stochastic dynamics and partial observability.
3: %
4: Seeking a global optimum is difficult (NEXP complete), but seeking a Nash equilibrium ---each agent policy being a best response to the other agents--- is more accessible, and allowed addressing infinite-horizon problems with solutions in the form of finite state controllers.
5: %
6: In this paper, we show that this approach can be adapted to cases where only a generative model (a simulator) of the Dec-POMDP is available.
7: %
8: This requires relying on a simulation-based POMDP solver to construct an agent's FSC node by node.
9: %
10: A related process is used to heuristically derive initial FSCs. % that help converge to better Nash equilibria.
11: %
12: Experiment with benchmarks shows that MC-JESP is competitive with exisiting Dec-POMDP solvers, even better than many offline methods using explicit models.
13: \end{abstract}
14: