1: \begin{abstract}
2: We present and analyze the Krylov--Bellman Boosting (KBB) algorithm for
3: policy evaluation in general state spaces. It alternates between
4: fitting the Bellman residual using non-parametric regression (as in
5: boosting), and estimating the value function via the least-squares
6: temporal difference (LSTD) procedure applied with a feature set that
7: grows adaptively over time. By exploiting the connection to Krylov
8: methods, we equip this method with two attractive guarantees. First,
9: we provide a general convergence bound that allows for separate
10: estimation errors in residual fitting and LSTD computation.
11: Consistent with our numerical experiments, this bound shows that
12: convergence rates depend on the restricted spectral structure, and are
13: typically super-linear. Second, by combining this meta-result with
14: sample-size dependent guarantees for residual fitting and LTSD
15: computation, we obtain concrete statistical guarantees that depend on
16: the sample size along with the complexity of the function class used
17: to fit the residuals. We illustrate the behavior of the KBB algorithm
18: for various types of policy evaluation problems, and typically find
19: large reductions in sample complexity relative to the standard
20: approach of fitted value iteration.
21: \end{abstract}
22: