abstract:81d3df9630c70056.tex

1: \begin{abstract}

2: %\input{abstract.tex}

3: Hannan consistency, or no external regret, is a~key concept for learning in games.

4: An action selection algorithm is Hannan consistent (HC) if its performance is eventually as good as selecting the~best fixed action in hindsight.

5: If both players in a~zero-sum normal form game use a~Hannan consistent algorithm, their average behavior converges to a~Nash equilibrium (NE) of the~game.

6: A similar result is known about extensive form games, but the~played strategies need to be Hannan consistent with respect to the~counterfactual values, which are often difficult to obtain.

7: We study zero-sum extensive form games with simultaneous moves, but

8: otherwise perfect information. These games generalize normal form games and they are

9: a special case of extensive form games.

10: We study whether applying HC algorithms in each decision point of these games directly to the~observed payoffs leads to convergence to a~Nash equilibrium.

11: This learning process corresponds to a~class of Monte Carlo Tree Search algorithms, which are popular for playing simultaneous-move games but do not have any known performance guarantees.

12: We show that using HC algorithms directly on the~observed payoffs is not sufficient to guarantee the~convergence. With an~additional averaging over joint actions, the~convergence is guaranteed, but empirically slower. We further define an~additional property of HC algorithms, which is sufficient to guarantee the~convergence without the~averaging and we empirically show that commonly used HC algorithms have this property.

13: \end{abstract}

14: