abstract:bf846ef0c12f9760.tex

1: \begin{abstract}

2:  This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation.

3: When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL

4: more sample efficient. We first consider the off-policy policy evaluation problem. To evaluate a new target policy, we analyze a

5: Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension.

6: To reduce the Lasso bias, we further propose a post model-selection estimator that applies fitted Q-evaluation to the features selected

7: via group Lasso. Under an additional signal strength assumption, we derive a sharper instance-dependent error bound that depends

8: on a divergence function measuring the distribution mismatch  between the data distribution and occupancy measure of the target policy.

9: Further, we study the Lasso fitted Q-iteration for batch policy optimization and establish a finite-sample error bound depending

10: on the ratio between the number of relevant features and restricted minimal eigenvalue of the data's covariance. In the end, we complement

11: the results with minimax lower bounds for batch-data policy evaluation/optimization that nearly match our upper bounds.

12: The results suggest that having well-conditioned data is crucial for sparse batch policy learning.

13: \end{abstract}

14: