abstract:68f9222b63eb4a0a.tex

1: \begin{abstract}

2: %\cb

3: We study the limiting behavior of the mixed strategies that result from a general class of optimal no-regret learning strategies in a repeated game setting where the stage game is any $2 \times 2$ competitive game (for which all the Nash equilibria of the game are completely mixed) that may be zero-sum or non-zero-sum.

4: We consider optimal no-regret strategies that are mean-based (i.e. information set at each step is the empirical average of the opponent's realized play) and monotonic (either non-decreasing or non-increasing) in their argument.

5: We show that for \textit{any} such choice of strategies, the limiting mixed strategies of the players cannot converge almost surely to any Nash equilibrium.

6: This negative result is also shown to hold under a broad class of relaxations of these assumptions, which includes popular variants of Online-Mirror-Descent with optimism and/or adaptive step-sizes.

7: Finally, we conjecture that the monotonicity assumption can be removed, and provide partial evidence for this conjecture.

8: Our results identify the inherent stochasticity in players' realizations as a critical factor underlying this divergence, and demonstrate a crucial difference in outcomes between using the opponent's mixtures and realizations to make strategy updates.

9:

10: \keywords{No-regret Learning, Game Theory, Repeated Games, $2 \times 2$ games, Last-iterate Convergence.}

11:

12:

13: \end{abstract}

14: