abstract:5da6a78c31e5eac0.tex

1: \begin{abstract}

2: We consider the top-$k$ arm identification problem for multi-armed bandits with rewards belonging to a one-parameter canonical exponential family.

3: The objective is to select the set of $k$ arms with the highest mean rewards by sequential allocation of sampling efforts.

4: We propose a unified optimal allocation problem that identifies the complexity measures of this problem under the fixed-confidence, fixed-budget settings, and the posterior convergence rate from the Bayesian perspective.

5: We provide the first characterization of its optimality.

6: We provide the first provably optimal algorithm in the fixed-confidence setting for $k>1$.

7: We also propose an efficient heuristic algorithm for the top-$k$ identification problem.

8: Extensive numerical experiments demonstrate superior performance compare to existing methods in all three settings.

9: %We propose asymptotically optimal anytime and parameter-free algorithms based on the analysis of the optimal allocation problem.

10: %Numerical experiments demonstrate superior performance over existing algorithms.

11: \end{abstract}

12: