6f34bc97a6b6c037.tex
1: \begin{abstract}
2: This paper presents a general mean-field game (GMFG) framework for simultaneous learning and decision-making in stochastic games with a large population. It first establishes the existence of a unique Nash Equilibrium  to this GMFG, and explains that naively combining 
3: Q-learning with the fixed-point approach in classical MFGs yields unstable algorithms. It then proposes a Q-learning algorithm with Boltzmann policy (GMF-Q), with analysis of convergence property  and computational  complexity. 
4: The experiments  on repeated Ad auction problems  demonstrate that this GMF-Q algorithm is
5:  efficient and robust in terms of convergence and learning accuracy. Moreover, its performance is superior in convergence, stability, and learning ability, when compared with existing algorithms for multi-agent reinforcement learning.
6:  \end{abstract}
7: