abstract:d9966467774343f4.tex

1: \begin{abstract}

2:

3: We consider batch $Q$ learning with knowledge transfer, using samples from the target RL task as well as auxiliary samples from different but possibly related RL tasks.

4: We define the {\em similarity} between target and auxiliary RL tasks by the sparsity of the differences between their coefficients of the optimal $Q$ functions.

5: % Main literature include \cite{fan2020theoretical,jin2020provably}.

6: We propose the transferred $Q$ learning algorithm that iteratively use the aggregated data to estimate a coarse optimal $Q$ function and then refine it using only the target data.

7: We establish the algorithmic and statistical rate of convergence the estimate optimal $Q$ sequence obtained by transferred $Q$ learning.

8: The transferred $Q$ learning with auxiliary samples converges faster than that without auxiliary samples.

9:

10: \attn{Note there are other ways to do transfer depending how do we define similarity between tasks.}

11: \end{abstract}

12: