858f0a2046649c75.tex
1: \begin{abstract}
2: We study Nash equilibria learning of a general-sum stochastic game with  an unknown transition probability density
3: function. Agents take actions at  the current environment state  and their joint action influences the transition of the environment state and their
4: immediate rewards. Each agent only observes the environment state
5: and its own immediate reward and is  unknown about the actions or immediate rewards of others. We introduce the concepts of weighted asymptotic Nash equilibrium with probability $1$ and in probability. For the case with exact pseudo gradients, we design a two-loop algorithm  by the equivalence of Nash equilibrium  and  variational inequality problems.
6:  In the outer loop, we sequentially update a constructed strongly monotone variational inequality by updating a proximal parameter while employing a single-call extra-gradient algorithm in the inner loop for solving the constructed variational inequality. We show that if the associated Minty variational inequality has a solution, then the designed algorithm  converges to the $k^{\frac{1}{2}}$-weighted asymptotic  Nash equilibrium.  Further, for the case with  unknown pseudo gradients, we propose a decentralized algorithm, where the G(PO)MDP gradient estimator of the pseudo gradient is provided by  Monte-Carlo  simulations. The convergence to the  $k^{\frac{1}{4}}$-weighted asymptotic Nash equilibrium in probability is achieved.
7: \end{abstract}
8: