1: \begin{abstract}
2: We show that learning algorithms satisfying a \emph{low approximate regret} property experience fast convergence to approximate optimality in a large class of repeated games. Our property, which simply requires that each learner has small regret compared to a $(1+\epsilon)$-multiplicative approximation to the best action in hindsight, is ubiquitous among learning algorithms; it is satisfied even by the vanilla Hedge forecaster. Our results improve upon recent work of Syrgkanis et al. \cite{SyrgkanisALS15} in a number of ways.
3: We require only that players observe payoffs under other players' realized actions, as opposed to expected payoffs. We further show that convergence occurs with high probability, and
4: show convergence under bandit feedback.
5: Finally, we improve upon the speed of convergence by a factor of $n$, the number of players. Both the scope of settings and the class of algorithms for which our analysis provides fast convergence are considerably broader than in previous work.
6:
7: Our framework
8: applies to dynamic population games via a low approximate regret property for shifting experts. Here we strengthen
9: the results of Lykouris et al. \cite{LykourisST16} in two ways: We allow players to select learning algorithms from a larger class, which includes a minor variant of the basic Hedge algorithm, and we increase the maximum churn in players for which approximate optimality is achieved.
10:
11: In the bandit setting we present a new algorithm which provides a ``small loss''-type bound with improved dependence on the number of actions in utility settings, and is both simple and efficient. This result may be of independent interest.
12: \end{abstract}
13: