01a17d18b8ad7ffd.tex
1: \begin{abstract}
2:     Binary optimization has a wide range of applications in combinatorial optimization problems such as MaxCut, MIMO detection, and MaxSAT. However, these problems are typically NP-hard due to the binary constraints. We develop a novel probabilistic model to sample the binary solution according to a parameterized policy distribution. Specifically,   minimizing the KL divergence between the parameterized policy distribution and the Gibbs distributions of the function value leads to a stochastic optimization problem whose  policy gradient can be derived explicitly similar to reinforcement learning.  For coherent exploration in discrete spaces, parallel Markov Chain Monte Carlo (MCMC) methods are employed to sample from the policy distribution with diversity and approximate the  gradient efficiently. We further develop a filter  scheme to replace the original objective function by the one with the local search technique  to  broaden the horizon of the function landscape. Convergence to stationary points in expectation of the policy gradient method is established based on the concentration inequality  for   MCMC. Numerical results show that this  framework is very promising to provide near-optimal solutions for quite a few  binary optimization problems.
3:     \keywords{Binary optimization \and Policy gradient \and Local search\and Markov chain Monte Carlo\and Convergence }
4:     \subclass{90C09 %Boolean programming
5:         \and 90C27 % Combinatorial optimization 
6:         \and 90C59 % Approximation methods and heuristics in mathematical programming
7:         %  \and 60J45 % Probabilistic potential theory
8:         \and 60J20 % Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.)
9:     }
10: \end{abstract}
11: