73eff96333b9fecd.tex
1: \begin{abstract}
2: Levin Tree Search (LTS) is a search algorithm that makes use of a policy (a probability distribution over actions) 
3: and comes with a theoretical guarantee on the number of expansions before reaching a goal node, depending on the quality of the policy. 
4: This guarantee can be used as a loss function, which we call the LTS loss, to optimize neural networks representing the policy (LTS+NN). 
5: In this work we show that the neural network can be substituted with parameterized context models originating from the online compression literature (LTS+CM). 
6: We show that the LTS loss is convex under this new model,
7: which allows for using standard convex optimization tools,
8: and obtain convergence guarantees to the optimal parameters in an online setting for a given set of solution trajectories --- guarantees that cannot be provided for neural networks. 
9: The new LTS+CM algorithm compares favorably against LTS+NN on several benchmarks: Sokoban (Boxoban), The Witness, and the 24-Sliding Tile puzzle (STP). The difference is particularly large on STP, where LTS+NN fails to solve most of the test instances while LTS+CM solves each test instance in a fraction of a second.
10: Furthermore, we show that LTS+CM is able to learn a policy that solves the Rubik's cube in only a few hundred expansions, which considerably improves upon previous machine learning techniques.
11: \end{abstract}
12: