5523adb3aeb27afe.tex
1: \begin{abstract}
2:  %Temporal difference (TD) learning is a simple algorithm for policy evaluation in reinforcement learning.
3: %The performance of TD learning is affected by high variance and it can be  naturally enhanced with  variance reduction techniques, such as the Stochastic Variance Reduced Gradient (SVRG) method. 
4: Temporal difference (TD) learning is a policy evaluation in reinforcement learning whose performance can be enhanced by variance reduction techniques. %such as the Stochastic Variance Reduced Gradient (SVRG) method. 
5: Recently, multiple works have sought to fuse TD learning with SVRG to obtain a policy evaluation method with a geometric rate of convergence. 
6: However, the resulting convergence rate is significantly weaker than what is achieved by SVRG in the setting of convex optimization. 
7: In this work we utilize a recent interpretation of TD-learning as the splitting of the gradient of an appropriately chosen function, thus simplifying the algorithm and fusing TD with SVRG. Our main result is a geometric convergence bound with predetermined learning rate of $1/8$, which is identical to the convergence bound available for SVRG in the convex setting. Our theoretical findings are supported by a set of experiments.
8: \end{abstract}
9: