b58bbaa68b4f413a.tex
1: \begin{abstract}
2: In Path Integral control problems a representation of an optimally controlled dynamical system can be formally computed and serve as a guidepost to learn a parametrized policy.
3: The Path Integral Cross-Entropy (PICE) method tries to exploit this, but is hampered by poor sample efficiency.
4: We propose a model-free algorithm called ASPIC (Adaptive Smoothing of Path Integral Control) that applies an inf-convolution to the cost function to speedup convergence of policy optimization.
5: We identify PICE as the infinite smoothing limit of such technique and show that the sample efficiency problems that PICE suffers disappear for finite levels of smoothing.
6: For zero smoothing this method becomes a greedy optimization of the cost, which is the standard approach in current reinforcement learning.
7: We show analytically and empirically that intermediate levels of smoothing are optimal, which renders the new method superior to both PICE and direct cost-optimization.
8: \end{abstract}
9: