abstract:283596fdf4fa4530.tex

1: \begin{abstract}

2: \begin{quote}

3: %Human can inherently learn new tasks through applying relevant knowledge from previous experiences. This can accelerate the learning process significantly.

4: %Transfer learning can improve the speed of a reinforcement learning algorithm greatly as well.

5: Transfer learning significantly accelerates the reinforcement learning process by exploiting relevant knowledge from previous experiences.

6: The problem of optimally selecting source policies during the learning process

7: is of great importance yet challenging.

8: There has been little theoretical analysis of this problem.

9: In this paper, we

10: develop an optimal online method to select source policies for reinforcement learning.

11: This method formulates online source policy selection as a multi-armed bandit problem and augments Q-learning with policy reuse.

12: We provide theoretical guarantees of the optimal selection process and  convergence to the optimal policy.

13: In addition, we conduct experiments on a grid-based robot navigation domain to demonstrate its efficiency and robustness by comparing to the state-of-the-art transfer learning method.

14: \end{quote}

15: \end{abstract}

16: