abstract:e176e24193f85506.tex

1: \begin{abstract}

2: Multi-agent policy gradient (MAPG) methods recently witness vigorous progress. However, there is a significant performance discrepancy between MAPG methods and state-of-the-art multi-agent value-based approaches. In this paper, we investigate causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (\name). This method introduces the idea of value function decomposition into the multi-agent actor-critic framework. Based on this idea, \name~supports efficient off-policy learning and addresses the issue of \emph{centralized-decentralized mismatch} and credit assignment in both discrete and continuous action spaces. We formally show that \name~critics have sufficient representational capability to guarantee convergence. In addition, empirical evaluations on the StarCraft II micromanagement benchmark and multi-agent particle environments demonstrate that \name~significantly outperforms both state-of-the-art value-based and policy-based multi-agent reinforcement learning algorithms. Demonstrative videos are available at \url{https://sites.google.com/view/dop-mapg/}.

3: \end{abstract}

4: