1c7bfda301ce3ff8.tex
1: \begin{abstract}
2: For the model of constrained multi-armed bandit,
3: we show that by construction there exists an index-based deterministic 
4: asymptotically optimal algorithm. 
5: The optimality is achieved by the convergence of the probability of 
6: choosing an optimal feasible arm to one over infinite horizon.
7: The algorithm is built upon Locatelli \emph{et al.}'s ``anytime 
8: parameter-free thresholding" algorithm under the assumption that
9: the optimal value is known. We provide a finite-time bound to the 
10: probability of the asymptotic optimality given as $1-O(|A|Te^{-T})$ 
11: where $T$ is the horizon size and $A$ is the set of the arms in the bandit.
12: We then study a relaxed-version of the algorithm in a general form that estimates 
13: the optimal value and discuss the asymptotic optimality of the algorithm
14: after a sufficiently large $T$ with examples.
15: \end{abstract}