abstract:d644a6a7cd7175a8.tex

1: \begin{abstract}

2:     We consider $Q$-learning with knowledge transfer, using samples from a target reinforcement learning (RL) task as well as source samples from different but related RL tasks.

3:    % The {\em similarity} between target and source RL tasks is characterized by the sparsity of the differences between the coefficients of their corresponding reward functions.

4:    We propose transfer learning algorithms for both batch and online $Q$-learning with offline source studies.

5:     The proposed transferred $Q$-learning algorithm contains a novel {\em re-targeting} step which enables vertical information-cascading along multiple steps in an RL task, besides the usual horizontal information-gathering as transfer learning (TL) for supervised learning.

6:     We establish first theoretical justifications of TL in RL tasks by showing a faster rate of convergence of the $Q$ function estimation in the offline RL transfer, and a lower regret bound in the offline-to-online RL transfer under certain similarity assumptions.

7:     Empirical evidences from both synthetic and real datasets are presented to backup the proposed algorithm and our theoretical results.

8:     % that the transferred $Q$-learning with source samples converges faster than that without source samples.

9: \end{abstract}

10: