1: \begin{abstract}
2: For the stochastic multi-armed bandit (MAB) problem from a constrained model
3: that generalizes the classical one, we show that
4: an asymptotic optimality is achievable by a simple strategy extended from the $\epsilon_t$-greedy strategy.
5: We provide a finite-time lower bound on the probability of correct selection of an optimal near-feasible arm
6: that holds for all time steps. Under some conditions, the bound approaches one as time $t$ goes to infinity.
7: A particular example sequence of $\{\epsilon_t\}$ having the asymptotic convergence rate in the order of $(1-\frac{1}{t})^4$ that holds from a sufficiently large $t$ is also discussed.
8: \end{abstract}