a00a7d056c9febae.tex
1: \begin{abstract}
2:   We consider the problem of finding Nash equilibrium for two-player turn-based zero-sum
3: games. Inspired by the AlphaGo Zero (AGZ) algorithm~\citep{silver2017mastering}, we 
4: develop a Reinforcement Learning based approach. 
5: Specifically, we propose  Explore-Improve-Supervise (EIS) method that 
6: combines ``exploration'', ``policy improvement'' and ``supervised learning'' to find the
7: value function and policy associated with Nash equilibrium. We identify sufficient conditions for convergence and correctness
8: for such an approach. For a concrete instance of EIS where random policy is used for 
9: ``exploration'', Monte-Carlo Tree Search is used for ``policy improvement'' and Nearest Neighbors is used for ``supervised learning'', we establish that this method finds an $\varepsilon$-approximate value function of Nash equilibrium 
10: in {$\widetilde{O}(\varepsilon^{-(d+4)})$} steps when the underlying state-space of the game is continuous and $d$-dimensional. This is nearly optimal as we establish 
11: a lower bound of {$\widetilde{\Omega}(\varepsilon^{-(d+2)})$} for any policy. 
12: 
13: 
14: \end{abstract}
15: