d6b74a06cd496b4f.tex
1: \begin{abstract}
2: \hskip7mm
3: 
4: Reinforcement Learning (RL) has made significant progress in the recent years, and has been applied to more and more challenging problems in various domains such as robotics or resources management in computer clusters. Challenge can notably arise from the existence of multiple simultaneously training agents, which often increases the size of state and action spaces, and makes it more difficult to learn the system dynamics. In this work we describe a multi-agent reinforcement learning (MARL) problem where each agent has to learn a policy that will maximize the long-run expected reward averaged over all agents. To tackle this cooperative problem in a decentralized way, We propose a multi-agent actor-critic algorithm with deterministic policies. Similar to recent works on the decentralized actor-critic with stochastic policies, we provide convergence guarantees for our algorithm when linear function approximations are used. The consideration of deterministic policies algorithms is motivated by the fact that they can sometimes outperform their stochastic counterparts in high-dimensional spaces. Nevertheless, applicability is still uncertain as decentralized setting, involving policy privacy among agents, requires on-policy learning, while deterministic policies are classically trained off-policy due to their low ability to explore the environment. Though, such algorithm may still be able to learn good policies in naturally noisy environments. We discuss further on the strenght and shortcomings of our Decentralized MARL algorithm in the light of the recent developments in MARL.
5: 
6: \end{abstract}
7: