abstract:12d7307cfec646e5.tex

1: \begin{abstract}

2: Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents.

3: Despite apparent similarity to the single-agent case, multi-agent problems are often harder to train and analyze theoretically.

4: In this work, we propose \mt{}, a new on-policy actor-critic algorithm, which extends \vt{} to the MARL setting.

5: The key advantage of our algorithm is its high scalability in a multi-worker setting.

6: To this end, \mt{} utilizes importance sampling as an off-policy correction method, which allows distributing the computations with no impact on the quality of training.

7: Furthermore, our algorithm is theoretically grounded -- we prove a fixed-point theorem that guarantees convergence.

8: We evaluate the algorithm extensively on the StarCraft Multi-Agent Challenge, a standard benchmark for multi-agent algorithms.

9: \mt{} achieves high performance on all its tasks and exceeds state-of-the-art results on some of them.

10: \end{abstract}

11: