1: \begin{abstract}
2: Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents.
3: Despite apparent similarity to the single-agent case, multi-agent problems are often harder to train and analyze theoretically.
4: In this work, we propose \mt{}, a new on-policy actor-critic algorithm, which extends \vt{} to the MARL setting.
5: The key advantage of our algorithm is its high scalability in a multi-worker setting.
6: To this end, \mt{} utilizes importance sampling as an off-policy correction method, which allows distributing the computations with no impact on the quality of training.
7: Furthermore, our algorithm is theoretically grounded -- we prove a fixed-point theorem that guarantees convergence.
8: We evaluate the algorithm extensively on the StarCraft Multi-Agent Challenge, a standard benchmark for multi-agent algorithms.
9: \mt{} achieves high performance on all its tasks and exceeds state-of-the-art results on some of them.
10: \end{abstract}
11: