d6b3688fc7939d7b.tex
1: \begin{abstract}
2: We study a Federated Reinforcement Learning (FedRL) problem with constraint heterogeneity.
3: In our setting, we aim to solve a reinforcement learning problem with multiple constraints while $N$ training agents are located in $N$ different environments with limited access to the constraint signals and they are expected to collaboratively learn a policy satisfying all constraint signals.
4: Such learning problems are prevalent in scenarios of Large Language Model (LLM) fine-tuning and healthcare applications.
5: To solve the problem, we propose federated primal-dual policy optimization methods based on traditional policy gradient methods.
6: Specifically, we introduce $N$ local Lagrange functions for agents to perform local policy updates, and these agents are then scheduled to periodically communicate on their local policies. 
7: Taking natural policy gradient (NPG) and proximal policy optimization (PPO) as policy optimization methods, we mainly focus on two instances of our algorithms, \ie, {FedNPG} and {FedPPO}.
8: We show that
9: FedNPG achieves global convergence with an $\tilde{O}(1/\sqrt{T})$ rate, and FedPPO efficiently solves complicated learning tasks with the use of deep neural networks.
10: \end{abstract}
11: