abstract:5055938fff53bda6.tex

1: \begin{abstract}%   <- trailing '%' for backward compatibility of .sty file

2: Emphatic Temporal Difference (TD) methods are a class of off-policy Reinforcement Learning (RL) methods involving the use of followon traces.

3: Despite the theoretical success of emphatic TD methods in addressing the notorious deadly triad of off-policy RL,

4: there are still two open problems.

5: % First, the motivation for emphatic TD methods by \citet{sutton2016emphatic} does not align with the convergence analysis of \citet{yu2015convergence}.

6: % Namely,

7: % a quantity in \citet{sutton2016emphatic} that is expected to be essential for the convergence of emphatic TD methods is not used in the actual convergence analysis.

8: First, followon traces typically suffer from large variance,

9: making them hard to use in practice.

10: Second, though \citet{yu2015convergence} confirms the asymptotic convergence of some emphatic TD methods for prediction problems,

11: there is still no finite sample analysis for any emphatic TD method for prediction, much less control.

12: In this paper,

13: we address those two open problems simultaneously via using \emph{truncated followon traces} in emphatic TD methods.

14: Unlike the original followon traces, which depend on all previous history,

15: truncated followon traces depend on only finite history, reducing variance and enabling the finite sample analysis of our proposed emphatic TD methods for both prediction and control.

16: \end{abstract}

17: