1: \begin{abstract}
2: Reinforcement learning (RL) faces challenges in evaluating policy trajectories within intricate game tasks due to the difficulty in designing comprehensive and precise reward functions.
3: This inherent difficulty curtails the broader application of RL within game environments characterized by diverse constraints.
4: Preference-based reinforcement learning (PbRL) presents a pioneering framework that capitalizes on human preferences as pivotal reward signals, thereby circumventing the need for meticulous reward engineering.
5: However, obtaining preference data from human experts is costly and inefficient, especially under conditions marked by complex constraints.
6: % To tackle this challenge, our study harnesses the capabilities of large language models (LLMs) to abstract trajectories, rank preferences, and reconstruct reward functions to optimize conditioned policies.
7: To tackle this challenge, we propose a LLM-enabled automatic preference generation framework named \modelname , which harnesses the capabilities of large language models (LLMs) to abstract trajectories, rank preferences, and reconstruct reward functions to optimize conditioned policies.
8: Experiments on tasks with complex language constraints demonstrated the effectiveness of our LLM-enabled reward functions, accelerating RL convergence and overcoming stagnation caused by slow or absent progress under original reward structures.
9: This approach mitigates the reliance on specialized human knowledge and demonstrates the potential of LLMs to enhance RL's effectiveness in complex environments in the wild.
10: \end{abstract}
11: