1: \begin{abstract}
2: We consider $Q$-learning with knowledge transfer, using samples from a target reinforcement learning (RL) task as well as source samples from different but related RL tasks.
3: % The {\em similarity} between target and source RL tasks is characterized by the sparsity of the differences between the coefficients of their corresponding reward functions.
4: We propose transfer learning algorithms for both batch and online $Q$-learning with offline source studies.
5: The proposed transferred $Q$-learning algorithm contains a novel {\em re-targeting} step which enables vertical information-cascading along multiple steps in an RL task, besides the usual horizontal information-gathering as transfer learning (TL) for supervised learning.
6: We establish first theoretical justifications of TL in RL tasks by showing a faster rate of convergence of the $Q$ function estimation in the offline RL transfer, and a lower regret bound in the offline-to-online RL transfer under certain similarity assumptions.
7: Empirical evidences from both synthetic and real datasets are presented to backup the proposed algorithm and our theoretical results.
8: % that the transferred $Q$-learning with source samples converges faster than that without source samples.
9: \end{abstract}
10: