1: \begin{abstract}
2: Mean Field Games (MFGs) have the ability to handle large-scale multi-agent systems, but learning Nash equilibria in MFGs remains a challenging task.
3: % Fictitious play (FP) and Online Mirror Descent (OMD) are two effective strategies for learning equilibria in MFGs.
4: % However, FP requires storing all historical best responses and sampling from the best response pool during execution, while OMD requires averaging historical Q-functions which is not feasible for neural networks. Moreover, the existing literature often assumes that agents always start from a fixed initial distribution.
5: In this paper, we propose a deep reinforcement learning (DRL) algorithm that achieves population-dependent Nash equilibrium without the need for averaging or sampling from history, inspired by Munchausen RL and Online Mirror Descent. Through the design of an additional inner-loop replay buffer, the agents can effectively learn to achieve Nash equilibrium from any distribution, mitigating catastrophic forgetting. The resulting policy can be applied to various initial distributions. Numerical experiments on four canonical examples demonstrate our algorithm has better convergence properties than SOTA algorithms, in particular a DRL version of Fictitious Play for population-dependent policies.
6: \end{abstract}
7: