abstract:da3213ceb816d2d0.tex

1: \begin{abstract}

2: To align mobile robot navigation policies with user preferences through reinforcement learning from human feedback (RLHF), reliable and behavior-diverse user queries are required.

3: However, deterministic policies fail to generate a variety of navigation trajectory suggestions for a given navigation task configuration.

4: We introduce EnQuery, a query generation approach using an ensemble of policies that achieve behavioral diversity through a regularization term.

5: For a given navigation task, EnQuery produces multiple navigation trajectory suggestions, thereby optimizing the efficiency of preference data collection with fewer queries.

6: Our methodology demonstrates superior performance in aligning navigation policies with user preferences in low-query regimes, offering enhanced policy convergence from sparse preference queries.

7: The evaluation is complemented with a novel explainability representation, capturing full scene navigation behavior of the mobile robot in a single plot.

8: \end{abstract}