d9966467774343f4.tex
1: \begin{abstract}
2: 
3: We consider batch $Q$ learning with knowledge transfer, using samples from the target RL task as well as auxiliary samples from different but possibly related RL tasks.
4: We define the {\em similarity} between target and auxiliary RL tasks by the sparsity of the differences between their coefficients of the optimal $Q$ functions.
5: % Main literature include \cite{fan2020theoretical,jin2020provably}.
6: We propose the transferred $Q$ learning algorithm that iteratively use the aggregated data to estimate a coarse optimal $Q$ function and then refine it using only the target data.
7: We establish the algorithmic and statistical rate of convergence the estimate optimal $Q$ sequence obtained by transferred $Q$ learning.
8: The transferred $Q$ learning with auxiliary samples converges faster than that without auxiliary samples.
9: 
10: \attn{Note there are other ways to do transfer depending how do we define similarity between tasks.}
11: \end{abstract}
12: