1: \begin{abstract}
2: Restless multi-armed bandits (RMABs) have been widely utilized to address resource allocation problems with Markov reward processes (MRPs).
3: Existing works often assume that the dynamics of MRPs are known prior,
4: which makes the RMAB problem solvable from an optimization perspective.
5: Nevertheless, an efficient learning-based solution for RMABs with unknown system dynamics remains an open problem.
6: In this paper, we study the cooperative resource allocation problem with unknown system dynamics of MRPs.
7: This problem can be modeled as a multi-agent online RMAB problem, where multiple agents collaboratively learn the system dynamics while maximizing their accumulated rewards.
8: We devise a federated online RMAB framework to mitigate the communication overhead and data privacy issue by adopting the federated learning paradigm.
9: Based on this framework, we put forth a Federated Thompson Sampling-enabled Whittle Index (FedTSWI) algorithm to solve this multi-agent online RMAB problem.
10: The FedTSWI algorithm enjoys a high communication and computation efficiency, and a privacy guarantee.
11: Moreover, we derive a regret upper bound for the FedTSWI algorithm.
12: Finally, we demonstrate the effectiveness of the proposed algorithm on the case of online multi-user multi-channel access.
13: Numerical results show that the proposed algorithm achieves a fast convergence rate of $\mathcal{O}(\sqrt{T\log(T)})$ and better performance compared with baselines.
14: More importantly, its sample complexity decreases with the number of agents.
15: \end{abstract}
16: