abstract:3f0a0dc34b9a996e.tex

1: \begin{abstract}

2: We show that learning algorithms satisfying a \emph{low approximate regret} property experience fast convergence to approximate optimality in a large class of repeated games. Our property, which simply requires that each learner has small regret compared to a $(1+\epsilon)$-multiplicative approximation to the best action in hindsight, is ubiquitous among learning algorithms; it is satisfied even by the vanilla Hedge forecaster. Our results improve upon recent work of Syrgkanis et al. \cite{SyrgkanisALS15} in a number of ways.

3: We require only that players observe payoffs under other players' realized actions, as opposed to expected payoffs. We further show that convergence occurs with high probability, and

4: show convergence under bandit feedback.

5: Finally, we improve upon the speed of convergence by a factor of $n$, the number of players. Both the scope of settings and the class of algorithms for which our analysis provides fast convergence are considerably broader than in previous work.

6:

7: Our framework

8: applies to dynamic population games via a low approximate regret property for shifting experts. Here we strengthen

9: the results of Lykouris et al. \cite{LykourisST16} in two ways: We allow players to select learning algorithms from a larger class, which includes a minor variant of the basic Hedge algorithm, and we increase the maximum churn in players for which approximate optimality is achieved.

10:

11: In the bandit setting we present a new algorithm which provides a ``small loss''-type bound with improved dependence on the number of actions in utility settings, and is both simple and efficient. This result may be of independent interest.

12: \end{abstract}

13: