1: \begin{abstract}
2: %\cb
3: We study the limiting behavior of the mixed strategies that result from a general class of optimal no-regret learning strategies in a repeated game setting where the stage game is any $2 \times 2$ competitive game (for which all the Nash equilibria of the game are completely mixed) that may be zero-sum or non-zero-sum.
4: We consider optimal no-regret strategies that are mean-based (i.e. information set at each step is the empirical average of the opponent's realized play) and monotonic (either non-decreasing or non-increasing) in their argument.
5: We show that for \textit{any} such choice of strategies, the limiting mixed strategies of the players cannot converge almost surely to any Nash equilibrium.
6: This negative result is also shown to hold under a broad class of relaxations of these assumptions, which includes popular variants of Online-Mirror-Descent with optimism and/or adaptive step-sizes.
7: Finally, we conjecture that the monotonicity assumption can be removed, and provide partial evidence for this conjecture.
8: Our results identify the inherent stochasticity in players' realizations as a critical factor underlying this divergence, and demonstrate a crucial difference in outcomes between using the opponent's mixtures and realizations to make strategy updates.
9:
10: \keywords{No-regret Learning, Game Theory, Repeated Games, $2 \times 2$ games, Last-iterate Convergence.}
11:
12:
13: \end{abstract}
14: