0dd89d1a3454930f.tex
1: \begin{abstract}
2: Search-based methods for hard combinatorial optimization are often guided by heuristics. Tuning heuristics in various conditions and situations is often time-consuming. In this paper, we propose {\ours} that learns a policy to pick heuristics and rewrite the local components of the current solution to iteratively improve it until convergence. The policy factorizes into a region-picking and a rule-picking component, each parameterized by a neural network trained with actor-critic methods in reinforcement learning. {\ours} captures the general structure of combinatorial problems and shows strong performance in three versatile tasks: expression simplification, online job scheduling and vehicle routing problems. {\ours} outperforms the expression simplification component in Z3~\cite{de2008z3}; outperforms DeepRM~\cite{mao2016resource} and Google OR-tools~\cite{GoogleOrTools} in online job scheduling; and outperforms recent neural baselines~\cite{nazari2018reinforcement,kool2018attention} and Google OR-tools~\cite{GoogleOrTools} in vehicle routing problems.~\footnote{The code is available at~\url{https://github.com/facebookresearch/neural-rewriter}.} %NeuroSAT~\cite{selsam2018learning} and DG-DAGRNN~\cite{amizadeh2018learning} in SAT with a small number of variables.
3: 
4: %~\xinyun{Actually I think the title is somehow ambiguous, because construct the planning from scratch is also ``progressive''... Should we say something about rewriting in the title?}
5: %\yuandong{We also need to compare with planning approaches, to justify that the performance is comparable but we are faster.}  
6: \end{abstract}
7: