2641cdf0d3e0fb88.tex
1: \begin{abstract}
2: 	We present a predictor-corrector framework, called \piccolo, that can transform a first-order model-free reinforcement or imitation learning algorithm into a new hybrid method that leverages predictive models to accelerate policy learning.  
3: 	The new ``{\piccolo}ed'' algorithm optimizes a policy by recursively repeating two steps: In the Prediction Step, the learner uses a model to predict the unseen future gradient 
4: 	and then applies the predicted estimate to update the policy; in the Correction Step, the learner runs the updated policy in the environment, receives the true gradient, and then corrects the policy using the gradient error.  
5: 	Unlike previous algorithms, \piccolo corrects for the mistakes of using imperfect predicted gradients and hence does not suffer from model bias.
6: 	The development of \piccolo is made possible by 
7: 	a novel
8: 	reduction from predictable online learning to adversarial online learning,  which provides a systematic way to modify existing first-order algorithms to achieve the optimal regret with respect to predictable information.
9: 	We show, in both theory and simulation, that the convergence rate 
10: 	of several firs	t-order model-free algorithms can be improved by \piccolo. 
11: \end{abstract}
12: