1: \begin{abstract}
2: This paper considers no-regret learning for repeated continuous-kernel
3: games
4: with lossy bandit
5: feedback.
6: Since it is difficult to give the explicit model of the utility
7: functions in
8: dynamic environments, the players' action can only be learned with
9: bandit
10: feedback.
11: Moreover, because of unreliable communication channels or privacy
12: protection,
13: the bandit
14: feedback may be
15: lost or dropped at random.
16: Therefore, we study the asynchronous online learning strategy
17: of the players
18: to adaptively adjust the next actions for minimizing the long-term
19: regret loss.
20: The paper provides a novel no-regret learning algorithm,
21: called Online Gradient Descent with lossy bandits (OGD-lb).
22: We first give the regret analysis for concave games with
23: differentiable
24: and Lipschitz utilities.
25: Then we show that the action profile converges to a Nash equilibrium
26: with
27: probability 1
28: when the game is also strictly monotone.
29: We further provide the mean square convergence rate
30: {$\mathcal{O}\left(k^{-2\min\{\beta, 1/6\}}\right)$} when
31: the game is $\beta-$strongly monotone.
32: In addition, we extend the algorithm to the case when the loss
33: probability of
34: the bandit feedback is unknown, and prove its almost sure convergence
35: to Nash
36: equilibrium for strictly monotone games.
37: Finally, we take the resource management in fog computing as an
38: application
39: example, and carry out numerical experiments to empirically
40: demonstrate the
41: algorithm performance.
42: \end{abstract}
43: