afa683193e12bbb3.tex
1: \begin{abstract}
2:  We consider the networked multi-agent reinforcement learning (MARL) problem in a fully decentralized setting, where agents learn to coordinate to achieve joint success. This problem is widely encountered in many areas including  traffic control, distributed control, and smart grids. 
3:  We assume each agent is located at a node of a communication network and can  exchange information only with its neighbors.  Using  softmax temporal consistency,  we derive a primal-dual decentralized optimization method and obtain a principled and data-efficient iterative algorithm named {\em value propagation}.  We prove a non-asymptotic convergence rate of $\mathcal{O}(1/T)$ with  nonlinear function approximation. To the best of our knowledge, it is the first MARL algorithm with a convergence guarantee in the control, off-policy, non-linear function approximation, fully decentralized setting. 
4: \end{abstract}
5: