abstract:16a5909d723a89f4.tex

1: \begin{abstract}

2: We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings

3: with two agents (i.e.,~zero-sum stochastic games). We consider an episodic setting where in

4: each episode, each player independently selects a policy and observes only

5: \emph{their  own} actions and rewards, along with the state. We show that

6: if both players run policy gradient methods in tandem, their policies

7: will converge to a min-max equilibrium of the game, as long as their

8: learning rates follow a two-timescale rule (which is necessary). %

9: To

10: the best of our knowledge, this constitutes the first finite-sample

11: convergence result for {independent policy gradient methods} in competitive RL; prior work has largely focused on centralized, coordinated

12: procedures for equilibrium computation. \dfcomment{do we want to keep last sentence in light of concurrent work? we could also switch ``independent learning'' to ``independent policy gradient methods''}\noah{updated to independent policy gradient...hopefully should be accurate now}

13:

14:

15:

16:

17: \end{abstract}

18: