abstract:7c2b4785ce700bb4.tex

1: \begin{abstract}

2: We present and analyze the Krylov--Bellman Boosting (KBB) algorithm for

3: policy evaluation in general state spaces.  It alternates between

4: fitting the Bellman residual using non-parametric regression (as in

5: boosting), and estimating the value function via the least-squares

6: temporal difference (LSTD) procedure applied with a feature set that

7: grows adaptively over time. By exploiting the connection to Krylov

8: methods, we equip this method with two attractive guarantees.  First,

9: we provide a general convergence bound that allows for separate

10: estimation errors in residual fitting and LSTD computation.

11: Consistent with our numerical experiments, this bound shows that

12: convergence rates depend on the restricted spectral structure, and are

13: typically super-linear. Second, by combining this meta-result with

14: sample-size dependent guarantees for residual fitting and LTSD

15: computation, we obtain concrete statistical guarantees that depend on

16: the sample size along with the complexity of the function class used

17: to fit the residuals.  We illustrate the behavior of the KBB algorithm

18: for various types of policy evaluation problems, and typically find

19: large reductions in sample complexity relative to the standard

20: approach of fitted value iteration.

21: \end{abstract}

22: