57525dc40eca1b02.tex
1: \begin{abstract}
2: Starting from a heuristic learning scheme for strategic $n$-person games, we derive a new class of continuous-time learning dynamics which consist of a replicator-like term adjusted by an entropic penalty that keeps players' strategies away from the boundary of the game's strategy space.
3: These \emph{entropy-driven} dynamics are equivalent to players taking an exponentially discounting aggregate of their on-going payoffs and then using a quantal response choice model to pick an action based on these performance scores.
4: Owing to this inherent duality, these dynamics satisfy a variant of the folk theorem of evolutionary game theory and converge to (arbitrarily precise) quantal approximations of Nash equilibria in potential games.
5: Motivated by applications to traffic engineering, we exploit this duality in order to design a discrete-time, payoff-based learning algorithm which retains these convergence properties and only requires players to observe their in-game payoffs:
6: in fact, the algorithm retains its robustness in the presence of stochastic perturbations and observation errors, and does not require any synchronization between players.
7: \end{abstract}
8: