d94126dcc84255ec.tex
1: \begin{abstract}
2: 
3: We present the Adaptive Entropy Tree Search (ANTS) algorithm, a planning method based on the Principle of Maximum Entropy. Importantly, we design ANTS so that it is a practical component of a planning-learning loop, outperforming state-of-the-art methods on the Atari benchmark. The key algorithmic novelty is entropy parameterization, which mitigates sensitivity to the temperature parameter - a bottleneck of the prior maximum entropy planning methods. To confirm our design choices, we perform a comprehensive suite of ablations in isolation from learning. Moreover, we theoretically show that ANTS enjoys exponential convergence in the softmax bandit setting.
4: 
5: \end{abstract}
6: