caa82250a052a802.tex
1: \begin{abstract} 
2: Policy evaluation is concerned with estimating the value function that predicts long-term values of states under a given policy.  It is a crucial step in many reinforcement-learning algorithms.  In this paper, we focus on policy evaluation with linear function approximation over a \emph{fixed} dataset.
3: We first transform the empirical policy evaluation problem into a
4: (quadratic) convex-concave saddle-point problem, and then present a primal-dual batch gradient method, as well as two stochastic variance reduction methods for solving the problem.
5: These algorithms scale linearly in both sample size and feature dimension.
6: Moreover, they achieve \emph{linear} convergence even when the saddle-point problem has only strong concavity in the dual variables 
7: but \emph{no} strong convexity in the primal variables. 
8: Numerical experiments on benchmark problems
9: demonstrate the effectiveness of our methods.
10: \end{abstract}