5055938fff53bda6.tex
1: \begin{abstract}%   <- trailing '%' for backward compatibility of .sty file
2: Emphatic Temporal Difference (TD) methods are a class of off-policy Reinforcement Learning (RL) methods involving the use of followon traces. 
3: Despite the theoretical success of emphatic TD methods in addressing the notorious deadly triad of off-policy RL,
4: there are still two open problems.
5: % First, the motivation for emphatic TD methods by \citet{sutton2016emphatic} does not align with the convergence analysis of \citet{yu2015convergence}. 
6: % Namely, 
7: % a quantity in \citet{sutton2016emphatic} that is expected to be essential for the convergence of emphatic TD methods is not used in the actual convergence analysis.
8: First, followon traces typically suffer from large variance,
9: making them hard to use in practice. 
10: Second, though \citet{yu2015convergence} confirms the asymptotic convergence of some emphatic TD methods for prediction problems,
11: there is still no finite sample analysis for any emphatic TD method for prediction, much less control.
12: In this paper, 
13: we address those two open problems simultaneously via using \emph{truncated followon traces} in emphatic TD methods.
14: Unlike the original followon traces, which depend on all previous history,
15: truncated followon traces depend on only finite history, reducing variance and enabling the finite sample analysis of our proposed emphatic TD methods for both prediction and control.
16: \end{abstract}
17: