abstract:caa82250a052a802.tex

1: \begin{abstract}

2: Policy evaluation is concerned with estimating the value function that predicts long-term values of states under a given policy.  It is a crucial step in many reinforcement-learning algorithms.  In this paper, we focus on policy evaluation with linear function approximation over a \emph{fixed} dataset.

3: We first transform the empirical policy evaluation problem into a

4: (quadratic) convex-concave saddle-point problem, and then present a primal-dual batch gradient method, as well as two stochastic variance reduction methods for solving the problem.

5: These algorithms scale linearly in both sample size and feature dimension.

6: Moreover, they achieve \emph{linear} convergence even when the saddle-point problem has only strong concavity in the dual variables

7: but \emph{no} strong convexity in the primal variables.

8: Numerical experiments on benchmark problems

9: demonstrate the effectiveness of our methods.

10: \end{abstract}