abstract:eb75580f1ef1d1aa.tex

1: \begin{abstract}

2: Multi-agent reinforcement learning (MARL) presents unique challenges as agents learn strategies through experiences. Gradient-based methods are often sensitive to hyperparameter selection and initial random seed variations.

3: Concurrently, significant advances have been made in solving Variational Inequalities (VIs)---which include equilibrium-finding problems--- particularly in addressing the non-converging rotational dynamics that impede convergence of traditional gradient-based optimization methods.

4: %

5: This paper explores the potential of leveraging VI-based techniques to improve MARL training. Specifically, we study the performance of VI methods—namely, Nested-Lookahead VI (nLA-VI) and Extragradient (EG)—in enhancing the \textit{multi-agent deep deterministic policy gradient} (MADDPG) algorithm.

6: We present a VI reformulation of the actor-critic algorithm for both single- and multi-agent settings. We introduce three algorithms that use nLA-VI, EG, and a combination of both, named \emph{LA-MADDPG}, \emph{EG-MADDPG}, and \emph{LA-EG-MADDPG}, respectively.

7: Our empirical results demonstrate that these VI-based approaches yield significant performance improvements in benchmark environments, such as the zero-sum games: \textit{rock-paper-scissors and matching pennies}, where equilibrium strategies can be quantitatively assessed, and the \textit{Multi-Agent Particle Environment: Predator prey} benchmark, where VI-based methods also yield balanced participation of agents from the same team.

8: \end{abstract}

9: