efa3f3c82df2afc9.tex
1: \begin{abstract}
2:     The growing literature of \textit{Federated Learning} (FL) has recently inspired \textit{Federated Reinforcement Learning} (FRL) to encourage multiple agents to federatively build a \emph{better} decision-making policy without sharing raw trajectories. Despite its promising applications, existing works on FRL fail to I) provide theoretical analysis on its convergence, and II) account for random system failures and adversarial attacks. Towards this end, we propose the first FRL framework the convergence of which is guaranteed and tolerant to less than half of the participating agents being random system failures or adversarial attackers. We prove that the sample efficiency of the proposed framework is guaranteed to improve with the number of agents and is able to account for such potential failures or attacks. All theoretical results are empirically verified on various RL benchmark tasks.
3:     Our code is available at \href{https://github.com/flint-xf-fan/Byzantine-Federeated-RL}{https://github.com/flint-xf-fan/Byzantine-Federeated-RL.}
4:     % the sample efficiency of which is guaranteed to scale with the number agents 
5:     % \textit{provably} sample-efficient and Byzantine-resilient FRL algorithm, called \textit{Federated Policy Gradient with Byzantine Resilience} (FedPG-BR), that converges to an $\epsilon$-stationary point within $O(\frac{1}{\epsilon^{5/3}K^{2/3}} + \frac{\alpha^{4/3}}{\epsilon^{5/3}})$ trajectories per agent. 
6:     % We empirically demonstrate the performance of FedPG-BR and its effectiveness against different Byzantine failures on various RL benchmarking tasks.
7: \end{abstract}
8: