abstract:b572573154de82f0.tex

1: \begin{abstract}

2: For the stochastic multi-armed bandit (MAB) problem from a constrained model

3: that generalizes the classical one, we show that

4: an asymptotic optimality is achievable by a simple strategy extended from the $\epsilon_t$-greedy strategy.

5: We provide a finite-time lower bound on the probability of correct selection of an optimal near-feasible arm

6: that holds for all time steps. Under some conditions, the bound approaches one as time $t$ goes to infinity.

7: A particular example sequence of $\{\epsilon_t\}$ having the asymptotic convergence rate in the order of $(1-\frac{1}{t})^4$ that holds from a sufficiently large $t$ is also discussed.

8: \end{abstract}