abstract:1c7bfda301ce3ff8.tex

1: \begin{abstract}

2: For the model of constrained multi-armed bandit,

3: we show that by construction there exists an index-based deterministic

4: asymptotically optimal algorithm.

5: The optimality is achieved by the convergence of the probability of

6: choosing an optimal feasible arm to one over infinite horizon.

7: The algorithm is built upon Locatelli \emph{et al.}'s ``anytime

8: parameter-free thresholding" algorithm under the assumption that

9: the optimal value is known. We provide a finite-time bound to the

10: probability of the asymptotic optimality given as $1-O(|A|Te^{-T})$

11: where $T$ is the horizon size and $A$ is the set of the arms in the bandit.

12: We then study a relaxed-version of the algorithm in a general form that estimates

13: the optimal value and discuss the asymptotic optimality of the algorithm

14: after a sufficiently large $T$ with examples.

15: \end{abstract}