abstract:49785b5dd669a28a.tex

1: \begin{abstract}

2: Restless multi-armed bandits (RMABs) have been widely utilized to address resource allocation problems with Markov reward processes (MRPs).

3: Existing works often assume that the dynamics of MRPs are known prior,

4: which makes the RMAB problem solvable from an optimization perspective.

5: Nevertheless, an efficient learning-based solution for RMABs with unknown system dynamics remains an open problem.

6: In this paper, we study the cooperative resource allocation problem with unknown system dynamics of MRPs.

7: This problem can be modeled as a multi-agent online RMAB problem, where multiple agents collaboratively learn the system dynamics while maximizing their accumulated rewards.

8: We devise a federated online RMAB framework to mitigate the communication overhead and data privacy issue by adopting the federated learning paradigm.

9: Based on this framework, we put forth a Federated Thompson Sampling-enabled Whittle Index (FedTSWI) algorithm to solve this multi-agent online RMAB problem.

10: The FedTSWI algorithm enjoys a high communication and computation efficiency, and a privacy guarantee.

11: Moreover, we derive a regret upper bound for the FedTSWI algorithm.

12: Finally, we demonstrate the effectiveness of the proposed algorithm on the case of online multi-user multi-channel access.

13: Numerical results show that the proposed algorithm achieves a fast convergence rate of $\mathcal{O}(\sqrt{T\log(T)})$ and better performance compared with baselines.

14: More importantly, its sample complexity decreases with the number of agents.

15: \end{abstract}

16: