d057eca6910ada8b.tex
1: \begin{abstract}
2: %This paper studies the policy optimization problem for collaborative multi-agent reinforcement learning over a decentralized network. We propose a novel decentralized natural policy gradient method, named MDNPG, which incorporates natural gradient, momentum-based variance reduction, and gradient tracking into decentralized stochastic gradient ascent. We establish the sample complexity of $\calO(n^{-1}\epsilon^{-3})$ to achieve an $\epsilon$-stationary point under standard assumptions, where $n$ is the number of agents. Such sample complexity matches the best available rate for decentralized policy gradient methods and leads to a linear speedup compared with centralized optimization approaches. Numerical experiments provide empirical verification of our theoretical results.
3: This paper studies a policy optimization problem arising from collaborative multi-agent reinforcement learning in a decentralized setting where agents communicate with their neighbors over an undirected graph to maximize the sum of their cumulative rewards. A novel decentralized natural policy gradient method, dubbed Momentum-based Decentralized Natural Policy Gradient (MDNPG), is proposed, which incorporates natural gradient, momentum-based variance reduction, and gradient tracking into the decentralized stochastic gradient ascent framework. The  $\calO(n^{-1}\epsilon^{-3})$ sample complexity for  MDNPG to converge to an $\epsilon$-stationary point has been established under standard assumptions, where $n$ is the number of agents. It indicates that  MDNPG can achieve the optimal convergence rate for decentralized policy gradient methods and possesses a linear speedup in contrast to  centralized optimization methods. Moreover, superior empirical performance of MDNPG over other state-of-the-art algorithms has been demonstrated by extensive numerical experiments.
4: \end{abstract}
5: