abstract:b58bbaa68b4f413a.tex

1: \begin{abstract}

2: In Path Integral control problems a representation of an optimally controlled dynamical system can be formally computed and serve as a guidepost to learn a parametrized policy.

3: The Path Integral Cross-Entropy (PICE) method tries to exploit this, but is hampered by poor sample efficiency.

4: We propose a model-free algorithm called ASPIC (Adaptive Smoothing of Path Integral Control) that applies an inf-convolution to the cost function to speedup convergence of policy optimization.

5: We identify PICE as the infinite smoothing limit of such technique and show that the sample efficiency problems that PICE suffers disappear for finite levels of smoothing.

6: For zero smoothing this method becomes a greedy optimization of the cost, which is the standard approach in current reinforcement learning.

7: We show analytically and empirically that intermediate levels of smoothing are optimal, which renders the new method superior to both PICE and direct cost-optimization.

8: \end{abstract}

9: