1: \begin{abstract}
2: Policy evaluation is concerned with estimating the value function that predicts long-term values of states under a given policy. It is a crucial step in many reinforcement-learning algorithms. In this paper, we focus on policy evaluation with linear function approximation over a \emph{fixed} dataset.
3: We first transform the empirical policy evaluation problem into a
4: (quadratic) convex-concave saddle-point problem, and then present a primal-dual batch gradient method, as well as two stochastic variance reduction methods for solving the problem.
5: These algorithms scale linearly in both sample size and feature dimension.
6: Moreover, they achieve \emph{linear} convergence even when the saddle-point problem has only strong concavity in the dual variables
7: but \emph{no} strong convexity in the primal variables.
8: Numerical experiments on benchmark problems
9: demonstrate the effectiveness of our methods.
10: \end{abstract}