0dafc67955978a62.tex
1: \begin{abstract}
2: Policy gradient methods enjoy strong practical performance in numerous tasks in reinforcement learning. Their theoretical understanding in multiagent settings, however, remains limited, especially beyond two-player competitive and potential Markov games. In this paper, we develop a new framework to characterize \emph{optimistic} policy gradient methods in multi-player Markov games with a \emph{single controller}. Specifically, under the further assumption that the game exhibits an \emph{equilibrium collapse}, in that the marginals of coarse correlated equilibria (CCE) induce Nash equilibria (NE), we show convergence to \emph{stationary} $\epsilon$-NE in $O(1/\epsilon^2)$ iterations, where $O(\cdot)$ suppresses polynomial factors in the natural parameters of the game. Such an equilibrium collapse is well-known to manifest itself in two-player zero-sum Markov games, but also occurs even in a class of multi-player Markov games with \emph{separable interactions}, as established by recent work. As a result, we bypass known complexity barriers for computing stationary NE when either of our assumptions fails. Our approach relies on a natural generalization of the classical \emph{Minty property} that we introduce, which we anticipate to have further applications beyond Markov games.
3: \end{abstract}
4: