abstract:d6b3688fc7939d7b.tex

1: \begin{abstract}

2: We study a Federated Reinforcement Learning (FedRL) problem with constraint heterogeneity.

3: In our setting, we aim to solve a reinforcement learning problem with multiple constraints while $N$ training agents are located in $N$ different environments with limited access to the constraint signals and they are expected to collaboratively learn a policy satisfying all constraint signals.

4: Such learning problems are prevalent in scenarios of Large Language Model (LLM) fine-tuning and healthcare applications.

5: To solve the problem, we propose federated primal-dual policy optimization methods based on traditional policy gradient methods.

6: Specifically, we introduce $N$ local Lagrange functions for agents to perform local policy updates, and these agents are then scheduled to periodically communicate on their local policies.

7: Taking natural policy gradient (NPG) and proximal policy optimization (PPO) as policy optimization methods, we mainly focus on two instances of our algorithms, \ie, {FedNPG} and {FedPPO}.

8: We show that

9: FedNPG achieves global convergence with an $\tilde{O}(1/\sqrt{T})$ rate, and FedPPO efficiently solves complicated learning tasks with the use of deep neural networks.

10: \end{abstract}

11: