abstract:06e94f485c5080c7.tex

1: \begin{abstract}

2: A variety of practical problems can be modeled by the decision-making process in multi-player games where a group of self-interested players aim at optimizing their own local objectives, while the objectives depend on the actions taken by others.

3: The local gradient information of each player, essential in implementing algorithms for finding game solutions, is all too often unavailable.

4: In this paper, we focus on designing solution algorithms for multi-player games using bandit feedback, i.e., the only available feedback at each player's disposal is the realized objective values.

5: To tackle the issue of large variances in the existing bandit learning algorithms with a single oracle call, we propose two algorithms by integrating the residual feedback scheme into single-call extra-gradient methods.

6: Subsequently, we show that the actual sequences of play can converge almost surely to a critical point if the game is pseudo-monotone plus and characterize the convergence rate to the critical point when the game is strongly pseudo-monotone.

7: The ergodic convergence rates of the generated sequences in monotone games are also investigated as a supplement.

8: Finally, the validity of the proposed algorithms is further verified via numerical examples.

9: \end{abstract}

10: