abstract:231ce37f5c9d24d5.tex

1: \begin{abstract}

2: 			This paper considers  no-regret learning for repeated continuous-kernel

3: 			games

4: 			with  lossy bandit

5: 			feedback.

6: 			Since it is difficult to give the explicit model of  the utility

7: 			functions   in

8: 			dynamic  environments, the players' action can only be learned with

9: 			bandit

10: 			feedback.

11: 			Moreover,  because of unreliable communication channels or privacy

12: 			protection,

13: 			the bandit

14: 			feedback   may be

15: 			lost  or dropped at random.

16: 			Therefore, we study the asynchronous online learning strategy

17: 			of the players

18: 			to adaptively adjust the next actions for minimizing the  long-term

19: 			regret loss.

20: 			The paper provides a novel no-regret learning algorithm,

21: 			called Online Gradient Descent with lossy bandits (OGD-lb).

22: 			We first give the regret analysis for  concave games with

23: 			differentiable

24: 			and Lipschitz utilities.

25: 			Then we  show that the action profile converges to a Nash equilibrium

26: 			with

27: 			probability 1

28: 			when the game is also strictly monotone.

29: 			We further  provide the mean square convergence rate

30: 			{$\mathcal{O}\left(k^{-2\min\{\beta, 1/6\}}\right)$} when

31: 			the game is $\beta-$strongly monotone.

32: 			In addition, we extend the algorithm to the case when the loss

33: 			probability of

34: 			the bandit feedback is unknown, and  prove  its almost sure convergence

35: 			to Nash

36: 			equilibrium  for strictly monotone games.

37: 			Finally, we take the  resource management in fog computing as an

38: 			application

39: 			example,	and carry out numerical experiments  to empirically

40: 			demonstrate the

41: 			algorithm performance.

42: 		\end{abstract}

43: