01a48bf467981084.tex
1: \begin{abstract}
2: In reinforcement learning, the \TD\ algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems.
3: %
4: One practical drawback of \TD\ is its sensitivity to the choice of the step-size. It is an empirically well-known fact that a large step-size leads to fast convergence, at the cost of higher variance and risk of instability.
5: % this work
6: In this work, we introduce the \emph{implicit \TD} algorithm which has the same function and computational cost as \TD, but is significantly more stable. We provide a theoretical explanation of this stability and an empirical evaluation of implicit \TD\ on
7: typical benchmark tasks.
8: % results
9: Our results show that implicit \TD\ outperforms standard \TD\ and a state-of-the-art method that automatically tunes the step-size,
10: and thus shows promise for wide applicability.
11: \end{abstract}
12: