1: \begin{abstract}
2: We study online learning and equilibrium computation in games with polyhedral decision sets, a property shared by normal-form games (NFGs) and extensive-form games (EFGs), when the learning agent is restricted to utilizing a best-response oracle.
3: We show how to achieve constant regret in zero-sum games and $O(T^{1/4})$ regret in general-sum games while using only $O(\log t)$ best-response queries at a given iteration $t$, thus improving over the best prior result, which required $O(T)$ queries per iteration.
4: Moreover, our framework yields the first last-iterate convergence guarantees for self-play with best-response oracles in zero-sum games.
5: This convergence occurs at a linear rate, though with a condition-number dependence. We go on to show a $O(1/\sqrt{T})$ best-iterate convergence rate without such a dependence.
6: Our results build on linear-rate convergence results for variants of the Frank-Wolfe (\fw{}) algorithm for strongly convex and smooth minimization problems over polyhedral domains.
7: These \fw{} results depend on a condition number of the polytope, known as facial distance. In order to enable application to settings such as EFGs, we show two broad new results:
8: 1) the facial distance for polytopes of the form $\{\vx \in \R^n_{\geq 0} \mid \bA\vx = \vb\}$ is at least $\gamma / \sqrt{k}$ where $\gamma$ is the minimum value of a nonzero coordinate of a vertex in the polytope and $k\leq n$ is the number of tight inequality constraints in the optimal face,
9: and
10: 2) the facial distance for polytopes of the form $\bA\vx = \vb, \bC \vx \leq \vd, \vx \geq \mathbf{0}$ where $\vx \in \R^n$, $\bC \geq \mathbf{0}$ is a nonzero integral matrix, and $\vd \geq \vec{0}$, is at least $1/(\|\bC\|_\infty \sqrt{n})$.
11: This yields the first such results for several problems, such as sequence-form polytopes, flow polytopes, and matching polytopes.
12:
13:
14: \end{abstract}
15: