abstract:5523adb3aeb27afe.tex

1: \begin{abstract}

2:  %Temporal difference (TD) learning is a simple algorithm for policy evaluation in reinforcement learning.

3: %The performance of TD learning is affected by high variance and it can be  naturally enhanced with  variance reduction techniques, such as the Stochastic Variance Reduced Gradient (SVRG) method.

4: Temporal difference (TD) learning is a policy evaluation in reinforcement learning whose performance can be enhanced by variance reduction techniques. %such as the Stochastic Variance Reduced Gradient (SVRG) method.

5: Recently, multiple works have sought to fuse TD learning with SVRG to obtain a policy evaluation method with a geometric rate of convergence.

6: However, the resulting convergence rate is significantly weaker than what is achieved by SVRG in the setting of convex optimization.

7: In this work we utilize a recent interpretation of TD-learning as the splitting of the gradient of an appropriately chosen function, thus simplifying the algorithm and fusing TD with SVRG. Our main result is a geometric convergence bound with predetermined learning rate of $1/8$, which is identical to the convergence bound available for SVRG in the convex setting. Our theoretical findings are supported by a set of experiments.

8: \end{abstract}

9: