3901054763bdcce6.tex
1: \begin{abstract}
2:     Reinforcement learning (RL) faces challenges in evaluating policy trajectories within intricate game tasks due to the difficulty in designing comprehensive and precise reward functions. 
3:     This inherent difficulty curtails the broader application of RL within game environments characterized by diverse constraints.
4:     Preference-based reinforcement learning (PbRL) presents a pioneering framework that capitalizes on human preferences as pivotal reward signals, thereby circumventing the need for meticulous reward engineering.
5:     However, obtaining preference data from human experts is costly and inefficient, especially under conditions marked by complex constraints. 
6: % To tackle this challenge, our study harnesses the capabilities of large language models (LLMs) to abstract trajectories, rank preferences, and reconstruct reward functions to optimize conditioned policies. 
7:     To tackle this challenge, we propose a LLM-enabled automatic preference generation framework named \modelname , which harnesses the capabilities of large language models (LLMs) to abstract trajectories, rank preferences, and reconstruct reward functions to optimize conditioned policies. 
8:     Experiments on tasks with complex language constraints demonstrated the effectiveness of our LLM-enabled reward functions, accelerating RL convergence and overcoming stagnation caused by slow or absent progress under original reward structures. 
9:     This approach mitigates the reliance on specialized human knowledge and demonstrates the potential of LLMs to enhance RL's effectiveness in complex environments in the wild. 
10: \end{abstract}
11: