abstract:67f450dc8f7dc67f.tex

1: \begin{abstract}

2:   This paper develops an online inverse reinforcement learning algorithm aimed at efficiently

3:   recovering a reward function from ongoing observations of an agent's actions. To reduce the

4:   computation time and storage space in reward estimation, this work assumes that each observed

5:   action implies a change of the Q-value distribution, and relates the change to the reward function

6:   via the gradient of Q-value with respect to reward function parameter. The gradients are computed

7:   with a novel Bellman Gradient Iteration method that allows the reward function to be updated

8:   whenever a new observation is available. The method's convergence to a local optimum is proved.

9:   This work tests the proposed method in two simulated environments, and evaluates the algorithm's

10:   performance under a linear reward function and a non-linear reward function. The results show that

11:   the proposed algorithm only requires a limited computation time and storage space, but achieves an

12:   increasing accuracy as the number of observations grows. We also present a potential application

13:   to robot cleaners at home.

14: \end{abstract}