abstract:381fafe2a291db0a.tex

1: \begin{abstract}

2:   [Zhang, ICML 2018] provided

3: the first decentralized actor-critic algorithm for  multi-agent reinforcement learning (MARL) that offers convergence guarantees.

4: In that work, policies are stochastic and are defined on finite action spaces.

5: We extend those results to offer a provably-convergent decentralized actor-critic algorithm

6: for learning deterministic policies on continuous action spaces. Deterministic policies are important in  real-world settings. To handle the lack of exploration inherent in deterministic policies, we consider both off-policy  and on-policy settings.

7: We provide  the expression of a local deterministic policy gradient,

8:   decentralized deterministic actor-critic algorithms

9: and  convergence guarantees for linearly-approximated  value functions.

10: This work will help enable decentralized MARL in high-dimensional action spaces

11: and pave the way for more widespread use of MARL.

12: \end{abstract}

13: