abstract:8541a2ffee823934.tex

1: \begin{abstract}

2: %

3: Centralized training for decentralized execution paradigm emerged as the state-of-the-art approach to $\epsilon$-optimally solving decentralized partially observable Markov decision processes. However, scalability remains a significant issue.

4: %

5: This paper presents a novel and more scalable alternative, namely sequential-move centralized training for decentralized execution.

6: %

7: This paradigm further pushes the applicability of \citeauthor{bellman}'s principle of optimality, raising three new properties.

8: %

9: First, it allows a central planner to reason upon sufficient sequential-move statistics instead of prior simultaneous-move ones.

10: %

11: Next, it proves that $\epsilon$-optimal value functions are piecewise linear and convex in sufficient sequential-move statistics.

12: %

13: Finally, it drops the complexity of the backup operators from double exponential to polynomial at the expense of longer planning horizons.

14: %

15: Besides, it makes it easy to use single-agent methods, \eg SARSA algorithm enhanced with these findings applies while still preserving convergence guarantees.

16: %

17: Experiments on two- as well as many-agent domains from the literature against $\epsilon$-optimal simultaneous-move solvers confirm the superiority of the novel approach.

18: %

19: This paradigm opens the door for efficient planning and reinforcement learning methods for multi-agent systems.

20: %

21: \end{abstract}

22: