abstract:2641cdf0d3e0fb88.tex

1: \begin{abstract}

2: 	We present a predictor-corrector framework, called \piccolo, that can transform a first-order model-free reinforcement or imitation learning algorithm into a new hybrid method that leverages predictive models to accelerate policy learning.

3: 	The new ``{\piccolo}ed'' algorithm optimizes a policy by recursively repeating two steps: In the Prediction Step, the learner uses a model to predict the unseen future gradient

4: 	and then applies the predicted estimate to update the policy; in the Correction Step, the learner runs the updated policy in the environment, receives the true gradient, and then corrects the policy using the gradient error.

5: 	Unlike previous algorithms, \piccolo corrects for the mistakes of using imperfect predicted gradients and hence does not suffer from model bias.

6: 	The development of \piccolo is made possible by

7: 	a novel

8: 	reduction from predictable online learning to adversarial online learning,  which provides a systematic way to modify existing first-order algorithms to achieve the optimal regret with respect to predictable information.

9: 	We show, in both theory and simulation, that the convergence rate

10: 	of several firs	t-order model-free algorithms can be improved by \piccolo.

11: \end{abstract}

12: