1: \begin{abstract}
2: %Temporal difference (TD) learning is a simple algorithm for policy evaluation in reinforcement learning.
3: %The performance of TD learning is affected by high variance and it can be naturally enhanced with variance reduction techniques, such as the Stochastic Variance Reduced Gradient (SVRG) method.
4: Temporal difference (TD) learning is a policy evaluation in reinforcement learning whose performance can be enhanced by variance reduction techniques. %such as the Stochastic Variance Reduced Gradient (SVRG) method.
5: Recently, multiple works have sought to fuse TD learning with SVRG to obtain a policy evaluation method with a geometric rate of convergence.
6: However, the resulting convergence rate is significantly weaker than what is achieved by SVRG in the setting of convex optimization.
7: In this work we utilize a recent interpretation of TD-learning as the splitting of the gradient of an appropriately chosen function, thus simplifying the algorithm and fusing TD with SVRG. Our main result is a geometric convergence bound with predetermined learning rate of $1/8$, which is identical to the convergence bound available for SVRG in the convex setting. Our theoretical findings are supported by a set of experiments.
8: \end{abstract}
9: