abstract:ec7cb0d98abef5f7.tex

1: \begin{abstract}

2: Multi-agent reinforcement learning (MARL), despite its popularity and empirical success,  suffers from the curse of dimensionality. This paper  builds the mathematical framework to approximate cooperative MARL by  a mean-field control (MFC) approach, and shows that the approximation error is of $\mathcal{O}(\frac{1}{\sqrt{N}})$. By establishing an appropriate form of the dynamic programming principle for both the value function and the Q function, it proposes a model-free kernel-based Q-learning algorithm (MFC-K-Q), which is shown to have a linear convergence rate for the MFC problem, the first of its kind in the  MARL literature. It further  establishes that the convergence rate and the sample complexity of MFC-K-Q are independent of the number of agents $N$, which provides an $\mathcal{O}(\frac{1}{\sqrt{N}})$ approximation to the MARL problem with $N$ agents in the learning environment.

3: %, { with an approximation error to the MARL problem on the order of $\mathcal{O}(\frac{1}{\sqrt{N}})$}.

4: Empirical studies for the network traffic congestion problem demonstrate that MFC-K-Q outperforms existing  MARL algorithms when $N$ is large, for instance when $N>50$.

5: \end{abstract}

6: