1: \begin{abstract}
2: Various algorithms in reinforcement learning exhibit dramatic
3: variability in their convergence rates and ultimate accuracy as a
4: function of the problem structure. Such instance-specific behavior
5: is not captured by existing global minimax bounds, which are
6: worst-case in nature. We analyze the problem of estimating optimal
7: $Q$-value functions for a discounted Markov decision process with
8: discrete states and actions and identify an instance-dependent
9: functional that controls the difficulty of estimation in the
10: $\ell_\infty$-norm. Using a local minimax framework, we show that
11: this functional arises in lower bounds on the accuracy on any
12: estimation procedure. In the other direction, we establish the
13: sharpness of our lower bounds, up to factors logarithmic in the
14: state and action spaces, by analyzing a variance-reduced version of
15: $Q$-learning. Our theory provides a precise way of distinguishing
16: ``easy'' problems from ``hard'' ones in the context of $Q$-learning,
17: as illustrated by an ensemble with a continuum of difficulty.
18: \end{abstract}
19: