2b85ea3939d45c3d.tex
1: \begin{abstract}%   <- trailing ''%'' for backward compatibility of .sty file
2: The purpose of this paper is to provide further understanding  into the structure of the sequential allocation (``stochastic multi-armed bandit'',  
3: or MAB) problem by establishing probability one finite horizon bounds and convergence rates for the sample (or ``pseudo'') regret associated with two simple classes of allocation policies $\pi$. 
4: 
5: For any slowly increasing function $g$, subject to mild regularity constraints, we construct two policies (the $g$-Forcing, and the $g$-Inflated Sample Mean) that achieve a measure of regret of order $  O(g(n))$ almost surely as $n \to \infty$, bound from above and below. Additionally, almost sure upper and lower bounds on the remainder term are established. In the constructions herein, the function $g$ effectively controls the ``exploration'' of the classical ``exploration/exploitation'' tradeoff. 
6:  
7:  \end{abstract}
8: