abstract:7218c930ccb1cb4e.tex

1: \begin{abstract}

2:     We study online learning and equilibrium computation in games with polyhedral decision sets, a property shared by normal-form games (NFGs) and extensive-form games (EFGs), when the learning agent is restricted to utilizing a best-response oracle.

3:     We show how to achieve constant regret in zero-sum games and $O(T^{1/4})$ regret in general-sum games while using only $O(\log t)$ best-response queries at a given iteration $t$, thus improving over the best prior result, which required $O(T)$ queries per iteration.

4:     Moreover, our framework yields the first last-iterate convergence guarantees for self-play with best-response oracles in zero-sum games.

5:     This convergence occurs at a linear rate, though with a condition-number dependence. We go on to show a $O(1/\sqrt{T})$ best-iterate convergence rate without such a dependence.

6:     Our results build on linear-rate convergence results for variants of the Frank-Wolfe (\fw{}) algorithm for strongly convex and smooth minimization problems over polyhedral domains.

7:     These \fw{} results depend on a condition number of the polytope, known as facial distance. In order to enable application to settings such as EFGs, we show two broad new results:

8:     1) the facial distance for polytopes of the form $\{\vx \in \R^n_{\geq 0} \mid \bA\vx = \vb\}$ is at least $\gamma / \sqrt{k}$ where $\gamma$ is the minimum value of a nonzero coordinate of a vertex in the polytope and $k\leq n$ is the number of tight inequality constraints in the optimal face,

9:     and

10:     2) the facial distance for polytopes of the form $\bA\vx = \vb, \bC \vx \leq \vd, \vx \geq \mathbf{0}$ where $\vx \in \R^n$, $\bC \geq \mathbf{0}$ is a nonzero integral matrix, and $\vd \geq \vec{0}$, is at least $1/(\|\bC\|_\infty \sqrt{n})$.

11:     This yields the first such results for several problems, such as sequence-form polytopes, flow polytopes, and matching polytopes.

12:

13:

14: \end{abstract}

15: