abstract:ece99dc8c1cf5284.tex

1: \begin{abstract}

2: Current approaches to multi-agent cooperation rely heavily on centralized mechanisms or explicit communication protocols to ensure convergence.

3: This paper studies the problem of distributed multi-agent learning without resorting to explicit coordination schemes.

4: The proposed algorithm (DM$^2$) leverages distribution matching to facilitate independent agents' coordination.

5: Each individual agent matches a target distribution of concurrently sampled trajectories from a joint expert policy.

6: The theoretical analysis shows that under some conditions, if each agent optimizes their individual distribution matching objective, the agents increase a lower bound on the objective of matching the joint expert policy, allowing convergence to the joint expert policy.

7: Further, if the distribution matching objective is aligned with a joint task, a combination of environment reward and distribution matching reward leads to the same equilibrium.

8: Experimental validation on the StarCraft domain shows that combining the reward for distribution matching with the environment reward allows agents to outperform a fully distributed baseline.

9: Additional experiments probe the conditions under which expert demonstrations need to be sampled in order to outperform the fully distributed baseline.

10: \end{abstract}

11: