1: \begin{abstract}
2: %\input{abstract.tex}
3: Hannan consistency, or no external regret, is a~key concept for learning in games.
4: An action selection algorithm is Hannan consistent (HC) if its performance is eventually as good as selecting the~best fixed action in hindsight.
5: If both players in a~zero-sum normal form game use a~Hannan consistent algorithm, their average behavior converges to a~Nash equilibrium (NE) of the~game.
6: A similar result is known about extensive form games, but the~played strategies need to be Hannan consistent with respect to the~counterfactual values, which are often difficult to obtain.
7: We study zero-sum extensive form games with simultaneous moves, but
8: otherwise perfect information. These games generalize normal form games and they are
9: a special case of extensive form games.
10: We study whether applying HC algorithms in each decision point of these games directly to the~observed payoffs leads to convergence to a~Nash equilibrium.
11: This learning process corresponds to a~class of Monte Carlo Tree Search algorithms, which are popular for playing simultaneous-move games but do not have any known performance guarantees.
12: We show that using HC algorithms directly on the~observed payoffs is not sufficient to guarantee the~convergence. With an~additional averaging over joint actions, the~convergence is guaranteed, but empirically slower. We further define an~additional property of HC algorithms, which is sufficient to guarantee the~convergence without the~averaging and we empirically show that commonly used HC algorithms have this property.
13: \end{abstract}
14: