1: \begin{abstract}
2: We study best-response type learning dynamics for two player zero-sum
3: matrix games. We consider two settings that are distinguished by the
4: type of information that each player has about the game and their
5: opponent’s strategy. The first setting is the full information case,
6: in which each player knows their own and the opponent’s payoff
7: matrices and observes the opponent’s mixed strategy. The second
8: setting is the \infosetuplower case, where players do not observe the
9: opponent’s strategy and are not aware of either of the payoff matrices
10: (instead they only observe their realized payoffs). For this setting,
11: also known as the radically uncoupled case in the learning in games
12: literature, we study a two-timescale learning dynamics that combine
13: smoothed best-response type updates for strategy estimates with a
14: TD-learning update to estimate a local payoff function. For these
15: dynamics, without additional exploration, we provide polynomial-time
16: finite-sample guarantees for convergence to an $\epsilon$-Nash
17: equilibrium.
18: \end{abstract}
19: