abstract:01a48bf467981084.tex

1: \begin{abstract}

2: In reinforcement learning, the \TD\ algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems.

3: %

4: One practical drawback of \TD\ is its sensitivity to the choice of the step-size. It is an empirically well-known fact that a large step-size leads to fast convergence, at the cost of higher variance and risk of instability.

5: % this work

6: In this work, we introduce the \emph{implicit \TD} algorithm which has the same function and computational cost as \TD, but is significantly more stable. We provide a theoretical explanation of this stability and an empirical evaluation of implicit \TD\ on

7: typical benchmark tasks.

8: % results

9: Our results show that implicit \TD\ outperforms standard \TD\ and a state-of-the-art method that automatically tunes the step-size,

10: and thus shows promise for wide applicability.

11: \end{abstract}

12: