abstract:b5718f70999a6ace.tex

1: \begin{abstract}

2:

3: Robots often need to learn the human's reward function online, during the current interaction.

4: This real-time learning requires fast but approximate learning rules: when the human's behavior is noisy or suboptimal, current approximations can result in unstable robot learning.

5: Accordingly, in this paper we seek to enhance the robustness and convergence properties of gradient descent learning rules when inferring the human's reward parameters.

6: We model the robot's learning algorithm as a \textit{dynamical system} over the human preference parameters, where the human's true (but unknown) preferences are the equilibrium point.

7: This enables us to perform Lyapunov stability analysis to derive the conditions under which the robot's learning dynamics converge.

8: {Our proposed algorithm (StROL) uses these conditions to learn robust-by-design learning rules: given the original learning dynamics, StROL outputs a modified learning rule that now converges to the human's true parameters under a larger set of human inputs.

9: In practice, these autonomously generated learning rules can correctly infer what the human is trying to convey, even when the human is noisy, biased, and suboptimal.}

10: Across simulations and a user study we find that StROL results in a more accurate estimate and less regret than state-of-the-art approaches for online reward learning.

11: See videos and code here: \url{https://github.com/VT-Collab/StROL_RAL}

12:

13: \end{abstract}

14: