bf846ef0c12f9760.tex
1: \begin{abstract}
2:  This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation. 
3: When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL 
4: more sample efficient. We first consider the off-policy policy evaluation problem. To evaluate a new target policy, we analyze a 
5: Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension. 
6: To reduce the Lasso bias, we further propose a post model-selection estimator that applies fitted Q-evaluation to the features selected 
7: via group Lasso. Under an additional signal strength assumption, we derive a sharper instance-dependent error bound that depends 
8: on a divergence function measuring the distribution mismatch  between the data distribution and occupancy measure of the target policy. 
9: Further, we study the Lasso fitted Q-iteration for batch policy optimization and establish a finite-sample error bound depending 
10: on the ratio between the number of relevant features and restricted minimal eigenvalue of the data's covariance. In the end, we complement 
11: the results with minimax lower bounds for batch-data policy evaluation/optimization that nearly match our upper bounds. 
12: The results suggest that having well-conditioned data is crucial for sparse batch policy learning.
13: \end{abstract}
14: