abstract:df024f3cc8013e56.tex

1: \begin{abstract}

2: In image-based robot manipulation tasks with large observation and action spaces, reinforcement learning struggles with low sample efficiency, slow training speed, and uncertain convergence.

3: As an alternative, large pre-trained foundation models have shown promise in robotic manipulation, particularly in zero-shot and few-shot applications.

4: However, using these models directly is unreliable due to limited reasoning capabilities and challenges in understanding physical and spatial contexts.

5: This paper introduces ExploRLLM, a novel approach that leverages the inductive bias of foundation models (e.g. Large Language Models) to guide exploration in reinforcement learning.

6: We also exploit these foundation models to reformulate the action and observation spaces to enhance the training efficiency in reinforcement learning.

7: Our experiments demonstrate that guided exploration enables much quicker convergence than training without it.

8: Additionally, we validate that ExploRLLM outperforms vanilla foundation model baselines and that the policy trained in simulation can be applied in real-world settings without additional training.

9: Code and videos are available at \link{https://explorllm.github.io}

10: \end{abstract}

11: