1: \begin{abstract}
2: This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation.
3: When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL
4: more sample efficient. We first consider the off-policy policy evaluation problem. To evaluate a new target policy, we analyze a
5: Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension.
6: To reduce the Lasso bias, we further propose a post model-selection estimator that applies fitted Q-evaluation to the features selected
7: via group Lasso. Under an additional signal strength assumption, we derive a sharper instance-dependent error bound that depends
8: on a divergence function measuring the distribution mismatch between the data distribution and occupancy measure of the target policy.
9: Further, we study the Lasso fitted Q-iteration for batch policy optimization and establish a finite-sample error bound depending
10: on the ratio between the number of relevant features and restricted minimal eigenvalue of the data's covariance. In the end, we complement
11: the results with minimax lower bounds for batch-data policy evaluation/optimization that nearly match our upper bounds.
12: The results suggest that having well-conditioned data is crucial for sparse batch policy learning.
13: \end{abstract}
14: