d644a6a7cd7175a8.tex
1: \begin{abstract}
2:     We consider $Q$-learning with knowledge transfer, using samples from a target reinforcement learning (RL) task as well as source samples from different but related RL tasks.
3:    % The {\em similarity} between target and source RL tasks is characterized by the sparsity of the differences between the coefficients of their corresponding reward functions.
4:    We propose transfer learning algorithms for both batch and online $Q$-learning with offline source studies.
5:     The proposed transferred $Q$-learning algorithm contains a novel {\em re-targeting} step which enables vertical information-cascading along multiple steps in an RL task, besides the usual horizontal information-gathering as transfer learning (TL) for supervised learning.
6:     We establish first theoretical justifications of TL in RL tasks by showing a faster rate of convergence of the $Q$ function estimation in the offline RL transfer, and a lower regret bound in the offline-to-online RL transfer under certain similarity assumptions.
7:     Empirical evidences from both synthetic and real datasets are presented to backup the proposed algorithm and our theoretical results.
8:     % that the transferred $Q$-learning with source samples converges faster than that without source samples.
9: \end{abstract}
10: