1: \begin{abstract}
2: Achieving convergence of multiple learning agents in general $N$-player games
3: is imperative for the development of safe and reliable machine learning (ML) algorithms and their application to
4: autonomous systems. Yet it is known that, outside the bounds of simple two-player
5: games, convergence cannot be taken for granted.
6:
7: To make progress in resolving this problem, we study the dynamics of smooth Q-Learning, a popular reinforcement learning algorithm which quantifies the tendency for learning agents to explore their state space or exploit their payoffs. We show a sufficient condition on the rate of exploration such that the Q-Learning dynamics is guaranteed to converge to a unique equilibrium in any game. We connect this result to games for which Q-Learning is known to converge with arbitrary exploration rates, including weighted Potential games and weighted zero sum polymatrix games.
8:
9: Finally, we examine the performance of the Q-Learning dynamic as measured by the Time Averaged Social Welfare, and comparing this with the Social Welfare achieved by the equilibrium. We provide a sufficient condition whereby the Q-Learning dynamic will outperform the equilibrium even if the dynamics do not converge.
10: \end{abstract}
11: