abstract:e5cf7eadca9a2962.tex

1: \begin{abstract}

2:     Algorithms based on regret matching, specifically regret matching$^+$ (RM$^+$), and its variants are the most popular approaches for solving large-scale two-player zero-sum games in practice.

3:     Unlike algorithms such as optimistic gradient descent ascent, which have strong last-iterate and ergodic convergence properties for zero-sum games, virtually nothing is known about the last-iterate properties of regret-matching algorithms.

4:     %Since last-iterate convergence is an attractive property both for numerical optimization reasons and because no-regret learning is viewed as a plausible method of real-world learning in games,

5:     Given the importance of last-iterate convergence for numerical optimization reasons and relevance as modeling real-word learning in games, in this paper, we study the last-iterate convergence properties of various popular variants of RM$^+$. First, we show numerically that several practical variants such as simultaneous RM$^+$, alternating RM$^+$, and simultaneous predictive RM$^+$, all lack last-iterate convergence guarantees even on a simple $3\times 3$ game.

6:     We then prove that recent variants of these algorithms based on a \emph{smoothing} technique do enjoy last-iterate convergence: we prove that \emph{extragradient RM$^{+}$} and \emph{smooth Predictive RM$^+$}  enjoy asymptotic last-iterate convergence (without a rate) and $1/\sqrt{t}$ best-iterate convergence. Finally, we introduce restarted variants of these algorithms, and show that they enjoy linear-rate last-iterate convergence.

7: \end{abstract}

8: