abstract:1c77bf6f77a813ef.tex

1: \begin{abstract}

2:   Various algorithms for reinforcement learning (RL) exhibit dramatic

3:   variation in their convergence rates as a function of problem

4:   structure. Such problem-dependent behavior is not captured by

5:   worst-case analyses and has accordingly inspired a growing effort in

6:   obtaining instance-dependent guarantees and deriving

7:   instance-optimal algorithms for RL problems. This research has been

8:   carried out, however, primarily within the confines of theory,

9:   providing guarantees that explain \textit{ex post} the performance

10:   differences observed. A natural next step is to convert these

11:   theoretical guarantees into guidelines that are useful in

12:   practice. We address the problem of obtaining sharp

13:   instance-dependent confidence regions for the policy evaluation

14:   problem and the optimal value estimation problem of an MDP, given

15:   access to an instance-optimal algorithm.  As a consequence, we

16:   propose a data-dependent stopping rule for instance-optimal

17:   algorithms.  The proposed stopping rule adapts to the

18:   instance-specific difficulty of the problem and allows for early

19:   termination for problems with favorable structure.

20: \end{abstract}

21: