381fafe2a291db0a.tex
1: \begin{abstract}
2:   [Zhang, ICML 2018] provided
3: the first decentralized actor-critic algorithm for  multi-agent reinforcement learning (MARL) that offers convergence guarantees. 
4: In that work, policies are stochastic and are defined on finite action spaces. 
5: We extend those results to offer a provably-convergent decentralized actor-critic algorithm 
6: for learning deterministic policies on continuous action spaces. Deterministic policies are important in  real-world settings. To handle the lack of exploration inherent in deterministic policies, we consider both off-policy  and on-policy settings.
7: We provide  the expression of a local deterministic policy gradient, 
8:   decentralized deterministic actor-critic algorithms 
9: and  convergence guarantees for linearly-approximated  value functions. 
10: This work will help enable decentralized MARL in high-dimensional action spaces
11: and pave the way for more widespread use of MARL.
12: \end{abstract}
13: