1: \begin{abstract}%
2: We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning using function approximation for marginal importance weights and $q$-functions when these are estimated using recent minimax methods. Under various combinations of realizability and completeness assumptions, we show that the minimax approach enables us to achieve a fast rate of convergence for weights and quality functions, characterized by the critical inequality \citep{bartlett2005}. Based on this result, we analyze convergence rates for OPE. In particular, we introduce novel alternative completeness conditions under which OPE is feasible and we present the first finite-sample result with first-order efficiency in non-tabular environments, i.e., having the minimal coefficient in the leading term.
3: %We offer theoretical investigations into off-policy evaluation in reinforcement learning using function approximators for (marginal importance) weights and value functions. Under various completeness assumptions, we show the minimax approach enables us to achieve a fast rate of convergence for weights and value functions, characterized by the critical inequality \citep{bartlett2005}. Based on this result, we analyze the convergence rate for OPE. In particular, we present the first finite sample result that indicates the first order asymptotic lower bound can be achieved in non-tabular environments.
4: \end{abstract}
5: