1: \begin{abstract}
2: \begin{quote}
3: %Human can inherently learn new tasks through applying relevant knowledge from previous experiences. This can accelerate the learning process significantly.
4: %Transfer learning can improve the speed of a reinforcement learning algorithm greatly as well.
5: Transfer learning significantly accelerates the reinforcement learning process by exploiting relevant knowledge from previous experiences.
6: The problem of optimally selecting source policies during the learning process
7: is of great importance yet challenging.
8: There has been little theoretical analysis of this problem.
9: In this paper, we
10: develop an optimal online method to select source policies for reinforcement learning.
11: This method formulates online source policy selection as a multi-armed bandit problem and augments Q-learning with policy reuse.
12: We provide theoretical guarantees of the optimal selection process and convergence to the optimal policy.
13: In addition, we conduct experiments on a grid-based robot navigation domain to demonstrate its efficiency and robustness by comparing to the state-of-the-art transfer learning method.
14: \end{quote}
15: \end{abstract}
16: