abstract:a00a7d056c9febae.tex

1: \begin{abstract}

2:   We consider the problem of finding Nash equilibrium for two-player turn-based zero-sum

3: games. Inspired by the AlphaGo Zero (AGZ) algorithm~\citep{silver2017mastering}, we

4: develop a Reinforcement Learning based approach.

5: Specifically, we propose  Explore-Improve-Supervise (EIS) method that

6: combines ``exploration'', ``policy improvement'' and ``supervised learning'' to find the

7: value function and policy associated with Nash equilibrium. We identify sufficient conditions for convergence and correctness

8: for such an approach. For a concrete instance of EIS where random policy is used for

9: ``exploration'', Monte-Carlo Tree Search is used for ``policy improvement'' and Nearest Neighbors is used for ``supervised learning'', we establish that this method finds an $\varepsilon$-approximate value function of Nash equilibrium

10: in {$\widetilde{O}(\varepsilon^{-(d+4)})$} steps when the underlying state-space of the game is continuous and $d$-dimensional. This is nearly optimal as we establish

11: a lower bound of {$\widetilde{\Omega}(\varepsilon^{-(d+2)})$} for any policy.

12:

13:

14: \end{abstract}

15: