abstract:36953661ee2254cf.tex

1: \begin{abstract}

2:   Various algorithms in reinforcement learning exhibit dramatic

3:   variability in their convergence rates and ultimate accuracy as a

4:   function of the problem structure.  Such instance-specific behavior

5:   is not captured by existing global minimax bounds, which are

6:   worst-case in nature.  We analyze the problem of estimating optimal

7:   $Q$-value functions for a discounted Markov decision process with

8:   discrete states and actions and identify an instance-dependent

9:   functional that controls the difficulty of estimation in the

10:   $\ell_\infty$-norm.  Using a local minimax framework, we show that

11:   this functional arises in lower bounds on the accuracy on any

12:   estimation procedure.  In the other direction, we establish the

13:   sharpness of our lower bounds, up to factors logarithmic in the

14:   state and action spaces, by analyzing a variance-reduced version of

15:   $Q$-learning.  Our theory provides a precise way of distinguishing

16:   ``easy'' problems from ``hard'' ones in the context of $Q$-learning,

17:   as illustrated by an ensemble with a continuum of difficulty.

18: \end{abstract}

19: