231ce37f5c9d24d5.tex
1: \begin{abstract}
2: 			This paper considers  no-regret learning for repeated continuous-kernel
3: 			games
4: 			with  lossy bandit
5: 			feedback.
6: 			Since it is difficult to give the explicit model of  the utility  
7: 			functions   in
8: 			dynamic  environments, the players' action can only be learned with
9: 			bandit
10: 			feedback.
11: 			Moreover,  because of unreliable communication channels or privacy 
12: 			protection,
13: 			the bandit
14: 			feedback   may be
15: 			lost  or dropped at random.
16: 			Therefore, we study the asynchronous online learning strategy
17: 			of the players
18: 			to adaptively adjust the next actions for minimizing the  long-term 
19: 			regret loss.
20: 			The paper provides a novel no-regret learning algorithm,
21: 			called Online Gradient Descent with lossy bandits (OGD-lb).
22: 			We first give the regret analysis for  concave games with
23: 			differentiable
24: 			and Lipschitz utilities.
25: 			Then we  show that the action profile converges to a Nash equilibrium 
26: 			with
27: 			probability 1
28: 			when the game is also strictly monotone.
29: 			We further  provide the mean square convergence rate
30: 			{$\mathcal{O}\left(k^{-2\min\{\beta, 1/6\}}\right)$} when
31: 			the game is $\beta-$strongly monotone.
32: 			In addition, we extend the algorithm to the case when the loss
33: 			probability of
34: 			the bandit feedback is unknown, and  prove  its almost sure convergence 
35: 			to Nash 
36: 			equilibrium  for strictly monotone games.
37: 			Finally, we take the  resource management in fog computing as an
38: 			application
39: 			example,	and carry out numerical experiments  to empirically  
40: 			demonstrate the 
41: 			algorithm performance.		
42: 		\end{abstract}
43: