ec05aad66407ad6a.tex
1: \begin{abstract}
2: No-regret algorithms are efficient at learning Nash equilibrium\ (NE) in two-player zero-sum normal-form games\ (NFGs) and extensive-form games\ (EFGs). However, most of them only have average-iterate convergence, which is unsatisfactory for solving large games that require deep neural networks to represent strategies since training deep neural networks to represent the average strategy is often challenging and incurs additional instability.
3: The more desirable convergence is the last-iterate convergence.
4: Recent studies have designed a reward transformation (RT) framework for Multiplicative Weight Updates (MWU), a classic no-regret algorithm, to establish last-iterate convergence for MWU, which has been used to build superhuman AI. 
5: However, the performance of this framework is inadequate. 
6: To improve the performance, we provide a closer analysis of the RT framework. 
7: We demonstrate that the essence of the RT framework is to transform the problem of learning NE in the original game into a series of strongly convex-concave optimization problems (SCCPs) and the sequence of the saddle points of these SCCPs converges to NE of the original game. 
8: We show the bottleneck of the algorithms built on the RT framework is the speed of addressing the SCCPs.
9: Inspired by this, we design a novel transformation method to enable the SCCPs can be addressed by Regret Matching+\ (RM+), an algorithm with low computation and superior empirical performance than MWU.
10: We refer to this algorithm as \textit{Reward Transformation Regret Macthing+\ (RTRM+)}. 
11: Then, we propose \textit{Reward Transformation  Counterfactual Regret Minimization+\ (RTCFR+)} to extend RTRM+ to EFGs by combining with the counterfactual regret decomposition framework. 
12: Experimental results show that our algorithms significantly outperform existing last-iterate and average-iterate convergence algorithms in NFGs and EFGs.
13: % original convex-concave optimization problem
14: 
15: 
16: 
17: \end{abstract}
18: