1: \begin{abstract}
2: Various algorithms for reinforcement learning (RL) exhibit dramatic
3: variation in their convergence rates as a function of problem
4: structure. Such problem-dependent behavior is not captured by
5: worst-case analyses and has accordingly inspired a growing effort in
6: obtaining instance-dependent guarantees and deriving
7: instance-optimal algorithms for RL problems. This research has been
8: carried out, however, primarily within the confines of theory,
9: providing guarantees that explain \textit{ex post} the performance
10: differences observed. A natural next step is to convert these
11: theoretical guarantees into guidelines that are useful in
12: practice. We address the problem of obtaining sharp
13: instance-dependent confidence regions for the policy evaluation
14: problem and the optimal value estimation problem of an MDP, given
15: access to an instance-optimal algorithm. As a consequence, we
16: propose a data-dependent stopping rule for instance-optimal
17: algorithms. The proposed stopping rule adapts to the
18: instance-specific difficulty of the problem and allows for early
19: termination for problems with favorable structure.
20: \end{abstract}
21: