1: \begin{abstract}
2: We present a predictor-corrector framework, called \piccolo, that can transform a first-order model-free reinforcement or imitation learning algorithm into a new hybrid method that leverages predictive models to accelerate policy learning.
3: The new ``{\piccolo}ed'' algorithm optimizes a policy by recursively repeating two steps: In the Prediction Step, the learner uses a model to predict the unseen future gradient
4: and then applies the predicted estimate to update the policy; in the Correction Step, the learner runs the updated policy in the environment, receives the true gradient, and then corrects the policy using the gradient error.
5: Unlike previous algorithms, \piccolo corrects for the mistakes of using imperfect predicted gradients and hence does not suffer from model bias.
6: The development of \piccolo is made possible by
7: a novel
8: reduction from predictable online learning to adversarial online learning, which provides a systematic way to modify existing first-order algorithms to achieve the optimal regret with respect to predictable information.
9: We show, in both theory and simulation, that the convergence rate
10: of several firs t-order model-free algorithms can be improved by \piccolo.
11: \end{abstract}
12: