8541a2ffee823934.tex
1: \begin{abstract}
2: %
3: Centralized training for decentralized execution paradigm emerged as the state-of-the-art approach to $\epsilon$-optimally solving decentralized partially observable Markov decision processes. However, scalability remains a significant issue.
4: %
5: This paper presents a novel and more scalable alternative, namely sequential-move centralized training for decentralized execution.
6: %
7: This paradigm further pushes the applicability of \citeauthor{bellman}'s principle of optimality, raising three new properties. 
8: %
9: First, it allows a central planner to reason upon sufficient sequential-move statistics instead of prior simultaneous-move ones.
10: %
11: Next, it proves that $\epsilon$-optimal value functions are piecewise linear and convex in sufficient sequential-move statistics.
12: %
13: Finally, it drops the complexity of the backup operators from double exponential to polynomial at the expense of longer planning horizons.  
14: %
15: Besides, it makes it easy to use single-agent methods, \eg SARSA algorithm enhanced with these findings applies while still preserving convergence guarantees.
16: %
17: Experiments on two- as well as many-agent domains from the literature against $\epsilon$-optimal simultaneous-move solvers confirm the superiority of the novel approach.
18: %
19: This paradigm opens the door for efficient planning and reinforcement learning methods for multi-agent systems.
20: %
21: \end{abstract}
22: