abstract:76e15633f9b55cb7.tex

1: \begin{abstract}

2: Achieving convergence of multiple learning agents in general $N$-player games

3: is imperative for the development of safe and reliable machine learning (ML) algorithms and their application to

4: autonomous systems. Yet it is known that, outside the bounds of simple two-player

5: games, convergence cannot be taken for granted.

6:

7: To make progress in resolving this problem, we study the dynamics of smooth Q-Learning, a popular reinforcement learning algorithm which quantifies the tendency for learning agents to explore their state space or exploit their payoffs. We show a sufficient condition on the rate of exploration such that the Q-Learning dynamics is guaranteed to converge to a unique equilibrium in any game. We connect this result to games for which Q-Learning is known to converge with arbitrary exploration rates, including weighted Potential games and weighted zero sum polymatrix games.

8:

9: Finally, we examine the performance of the Q-Learning dynamic as measured by the Time Averaged Social Welfare, and comparing this with the Social Welfare achieved by the equilibrium. We provide a sufficient condition whereby the Q-Learning dynamic will outperform the equilibrium even if the dynamics do not converge.

10:     \end{abstract}

11: