abstract:1f88fc16b8aa0392.tex

1: \begin{abstract}

2: Learning in strategy games (e.g. StarCraft, poker) requires the discovery of diverse policies. This is often achieved by iteratively training new policies against existing ones, growing a policy population that is robust to exploit.

3: This iterative approach suffers from two issues in {\it real-world} games: a) under finite budget, approximate best-response operators at each iteration needs truncating, resulting in under-trained {good-}responses populating the population;

4: b) repeated learning of basic skills at each iteration is wasteful and becomes intractable in the presence of increasingly strong opponents.

5: In this work, we propose Neural Population Learning (\neupl) as a solution to both issues. \neupl offers convergence guarantees to a population of {\it best}-responses under mild assumptions. By representing a population of policies within a single conditional model, \neupl enables transfer learning across policies.

6: Empirically, we show the generality, improved performance and efficiency of \neupl across several test domains\footnote{See \url{https://neupl.github.io/demo/} for supplementary illustrations.}. Most interestingly, we show that novel strategies become more accessible, not less, as the neural population expands.

7: \end{abstract}

8: