eb75580f1ef1d1aa.tex
1: \begin{abstract}
2: Multi-agent reinforcement learning (MARL) presents unique challenges as agents learn strategies through experiences. Gradient-based methods are often sensitive to hyperparameter selection and initial random seed variations. 
3: Concurrently, significant advances have been made in solving Variational Inequalities (VIs)---which include equilibrium-finding problems--- particularly in addressing the non-converging rotational dynamics that impede convergence of traditional gradient-based optimization methods. 
4: %
5: This paper explores the potential of leveraging VI-based techniques to improve MARL training. Specifically, we study the performance of VI methods—namely, Nested-Lookahead VI (nLA-VI) and Extragradient (EG)—in enhancing the \textit{multi-agent deep deterministic policy gradient} (MADDPG) algorithm. 
6: We present a VI reformulation of the actor-critic algorithm for both single- and multi-agent settings. We introduce three algorithms that use nLA-VI, EG, and a combination of both, named \emph{LA-MADDPG}, \emph{EG-MADDPG}, and \emph{LA-EG-MADDPG}, respectively.
7: Our empirical results demonstrate that these VI-based approaches yield significant performance improvements in benchmark environments, such as the zero-sum games: \textit{rock-paper-scissors and matching pennies}, where equilibrium strategies can be quantitatively assessed, and the \textit{Multi-Agent Particle Environment: Predator prey} benchmark, where VI-based methods also yield balanced participation of agents from the same team. 
8: \end{abstract}
9: