abstract:cc15def94133fd90.tex

1: \begin{abstract}

2: %     This paper proposes a computationally efficient algorithm for pure exploration in linear bandits by leveraging sampling and argmax oracles. Given a set of arms $\mc{Z}\subset \mathbb{R}^d$, the pure exploration linear bandit problem aims to return $\arg\max_{z\in \mc{Z}} z^{\top}\theta_{\ast}$, with high probability through noisy measurements of $x^{\top}\theta_{\ast}$ with $x\in \mc{X}\subset \mathbb{R}^d$.

3: %     Existing (asymptotically) optimal methods scale in the size of $|\mc{Z}|$ by requiring either a) potentially costly projections for each arm $z\in \mathcal{Z}$ or b) explicitly maintaining a subset of $\mc{Z}$ under consideration at each time.  In general, computing projections may be computationally expensive, and maintaining a subset of $\mc{Z}$ may be unfeasible in many combinatorial settings. Our approach overcomes both of these limitations by resorting to \textit{sampling} from an appropriate distribution over possible parameters combined with access to an argmax oracle. In this vein, it enjoys a similar computational efficiency of Thompson Sampling. However, unlike Thompson Sampling, which is known to be sub-optimal for pure exploration, our algorithm provably guarantees an exponential convergence rate with the exponent being the optimal among all possible allocations asymptotically.

4: % \end{abstract}

5: