27f2479e5a0ace84.tex
1: \begin{abstract}
2: 
3: %PASS Yi LIU
4: 
5: %Thompson sampling has emerged as an effective heuristic.
6: Thompson sampling is a heuristic algorithm for the multi-armed bandit problem which has a long tradition in machine learning.  The algorithm has a Bayesian spirit in the sense that it selects arms based on posterior samples of  reward probabilities of each arm.  By forging a connection between combinatorial binary bandits and spike-and-slab variable selection, we propose a stochastic optimization approach to subset selection called Thompson Variable Selection (TVS). TVS is a framework for interpretable machine learning  which does not rely on the underlying model to be linear. TVS brings together Bayesian reinforcement and machine learning in order to extend the reach of Bayesian subset selection to non-parametric models and  large datasets  with very many predictors  and/or very many observations.  Depending on the choice of a reward, TVS can be deployed in  offline as well as  online setups with streaming data batches.  Tailoring multiplay bandits to variable selection, we provide regret bounds  without necessarily assuming that the arm mean rewards be unrelated. We show a very strong empirical performance on both simulated and real data. Unlike  deterministic optimization methods for spike-and-slab variable selection, the stochastic nature  makes TVS less prone to local convergence and thereby more robust.  
7: %In this paper, we propose a variant Thompson Sampling algorithm as a stochastic optimization method for variable selection and demonstrates its strong empirical performance in terms of speed, low false discovery rate and high power. 
8: 
9: \end{abstract}
10: