abstract:1d8d2ea179886095.tex

1: \begin{abstract}

2:     This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO).

3:     PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle as special cases, and (2) in principle applies to general-sum, many-player games.

4:     Despite this, prior studies of PSRO have been focused on two-player zero-sum games,

5:     a regime wherein Nash equilibria are tractably computable.

6:     In moving from two-player zero-sum games to more general settings, computation of Nash equilibria quickly becomes infeasible.

7:     Here, we extend the theoretical underpinnings of PSRO by considering an alternative solution concept, \alpharank, which is unique (thus faces no equilibrium selection issues, unlike Nash) and applies readily to general-sum, many-player settings.

8:     We establish convergence guarantees in several games classes, and identify links between Nash equilibria and \alpharank.

9:     We demonstrate the competitive performance of \alpharank-based PSRO against an exact Nash solver-based PSRO in 2-player Kuhn and Leduc Poker.

10:     We then go beyond the reach of prior PSRO applications by considering 3- to 5-player poker games, yielding instances where \alpharank achieves faster convergence than approximate Nash solvers, thus establishing it as a favorable general games solver.

11:     We also carry out an initial empirical validation in MuJoCo soccer, illustrating the feasibility of the proposed approach in another complex domain.

12: \end{abstract}

13: