02dd14bc1ac1c894.tex
1: \begin{abstract}
2: %   We consider the \gi{multi-period multi-class contextual} packing problem
3: %   % in presence of 
4: %   with 
5: %   bandit
6: %   feedback % when
7: %   \gi{where
8: %   % experimental cost
9: %   reward
10: %   and resource consumption % depends on
11: %   % both
12: %   for an action % is both
13: %   % class % arms
14: %   % context dependent.
15: %   is a class-dependent  linear function of the context.}
16: %   \gi{We propose a new % efficient
17: %   computationally efficient estimator % for  which yields
18: %   for the reward and resource consumption function with 
19: %   a faster convergence rate that results a lower regret.} 
20: % % for estimated rewards and a
21: % % We also propose 
22: % % computationally 
23: % % efficient algorithm with a
24: % Our proposed bandit  policy is available in closed form.
25: % % closed-form of the optimal policy.
26: % % With
27: % % the efficient estimator and the optimal policy, 
28: % \gi{We show that the proposed policy and the new estimator
29: % % algorithm
30: % achieve sublinear regret with respect to % the dimension
31: % % of contexts,
32: % context dimension~$d$, number % of heterogeneous sub-group
33: % classes~$J$, and the
34: % time horizon $T$. % under non-degenerate stochastic contexts.
35: % % Finally,
36: % The results of our numerical experiments clearly show that the performance
37: % of our 
38: % % we demonstrate the superiority of our bandit 
39: % policy is clearly superior to % over
40: % other candidate policies.}
41: % % based on a suite of numerical experiments.
42: % \end{abstract}
43: