da3213ceb816d2d0.tex
1: \begin{abstract} 
2: To align mobile robot navigation policies with user preferences through reinforcement learning from human feedback (RLHF), reliable and behavior-diverse user queries are required. 
3: However, deterministic policies fail to generate a variety of navigation trajectory suggestions for a given navigation task configuration. 
4: We introduce EnQuery, a query generation approach using an ensemble of policies that achieve behavioral diversity through a regularization term. 
5: For a given navigation task, EnQuery produces multiple navigation trajectory suggestions, thereby optimizing the efficiency of preference data collection with fewer queries. 
6: Our methodology demonstrates superior performance in aligning navigation policies with user preferences in low-query regimes, offering enhanced policy convergence from sparse preference queries.
7: The evaluation is complemented with a novel explainability representation, capturing full scene navigation behavior of the mobile robot in a single plot.
8: \end{abstract}