abstract:3901054763bdcce6.tex

1: \begin{abstract}

2:     Reinforcement learning (RL) faces challenges in evaluating policy trajectories within intricate game tasks due to the difficulty in designing comprehensive and precise reward functions.

3:     This inherent difficulty curtails the broader application of RL within game environments characterized by diverse constraints.

4:     Preference-based reinforcement learning (PbRL) presents a pioneering framework that capitalizes on human preferences as pivotal reward signals, thereby circumventing the need for meticulous reward engineering.

5:     However, obtaining preference data from human experts is costly and inefficient, especially under conditions marked by complex constraints.

6: % To tackle this challenge, our study harnesses the capabilities of large language models (LLMs) to abstract trajectories, rank preferences, and reconstruct reward functions to optimize conditioned policies.

7:     To tackle this challenge, we propose a LLM-enabled automatic preference generation framework named \modelname , which harnesses the capabilities of large language models (LLMs) to abstract trajectories, rank preferences, and reconstruct reward functions to optimize conditioned policies.

8:     Experiments on tasks with complex language constraints demonstrated the effectiveness of our LLM-enabled reward functions, accelerating RL convergence and overcoming stagnation caused by slow or absent progress under original reward structures.

9:     This approach mitigates the reliance on specialized human knowledge and demonstrates the potential of LLMs to enhance RL's effectiveness in complex environments in the wild.

10: \end{abstract}

11: