abstract:e991bf7d9a6582d1.tex

1: \begin{abstract}

2:  Recent advances of gradient temporal-difference

3:  methods allow to learn off-policy multiple value functions in

4:  parallel without sacrificing convergence guarantees or computational efficiency. This opens

5:  up new possibilities for sound ensemble techniques in reinforcement learning.

6:   In this work we propose learning an ensemble of policies related through

7:   potential-based shaping rewards. The ensemble induces a combination

8:   policy by using a voting mechanism on its components. Learning

9:   happens in real time, and we empirically show the combination policy to

10:   outperform the individual policies of the ensemble.

11: \end{abstract}

12: