abstract:afa683193e12bbb3.tex

1: \begin{abstract}

2:  We consider the networked multi-agent reinforcement learning (MARL) problem in a fully decentralized setting, where agents learn to coordinate to achieve joint success. This problem is widely encountered in many areas including  traffic control, distributed control, and smart grids.

3:  We assume each agent is located at a node of a communication network and can  exchange information only with its neighbors.  Using  softmax temporal consistency,  we derive a primal-dual decentralized optimization method and obtain a principled and data-efficient iterative algorithm named {\em value propagation}.  We prove a non-asymptotic convergence rate of $\mathcal{O}(1/T)$ with  nonlinear function approximation. To the best of our knowledge, it is the first MARL algorithm with a convergence guarantee in the control, off-policy, non-linear function approximation, fully decentralized setting.

4: \end{abstract}

5: