b5718f70999a6ace.tex
1: \begin{abstract}
2: 
3: Robots often need to learn the human's reward function online, during the current interaction.
4: This real-time learning requires fast but approximate learning rules: when the human's behavior is noisy or suboptimal, current approximations can result in unstable robot learning.
5: Accordingly, in this paper we seek to enhance the robustness and convergence properties of gradient descent learning rules when inferring the human's reward parameters.
6: We model the robot's learning algorithm as a \textit{dynamical system} over the human preference parameters, where the human's true (but unknown) preferences are the equilibrium point. 
7: This enables us to perform Lyapunov stability analysis to derive the conditions under which the robot's learning dynamics converge.
8: {Our proposed algorithm (StROL) uses these conditions to learn robust-by-design learning rules: given the original learning dynamics, StROL outputs a modified learning rule that now converges to the human's true parameters under a larger set of human inputs.
9: In practice, these autonomously generated learning rules can correctly infer what the human is trying to convey, even when the human is noisy, biased, and suboptimal.}
10: Across simulations and a user study we find that StROL results in a more accurate estimate and less regret than state-of-the-art approaches for online reward learning. 
11: See videos and code here: \url{https://github.com/VT-Collab/StROL_RAL}
12: 
13: \end{abstract}
14: