abstract:69b1e41ef79c032b.tex

1: \begin{abstract}

2: We study best-response type learning dynamics for two player zero-sum

3: matrix games. We consider two settings that are distinguished by the

4: type of information that each player has about the game and their

5: opponent’s strategy. The first setting is the full information case,

6: in which each player knows their own and the opponent’s payoff

7: matrices and observes the opponent’s mixed strategy. The second

8: setting is the \infosetuplower case, where players do not observe the

9: opponent’s strategy and are not aware of either of the payoff matrices

10: (instead they only observe their realized payoffs). For this setting,

11: also known as the radically uncoupled case in the learning in games

12: literature, we study a two-timescale learning dynamics that combine

13: smoothed best-response type updates for strategy estimates with a

14: TD-learning update to estimate a local payoff function. For these

15: dynamics, without additional exploration, we provide polynomial-time

16: finite-sample guarantees for convergence to an $\epsilon$-Nash

17: equilibrium.

18: \end{abstract}

19: