1: \begin{abstract}
2: % We consider the \gi{multi-period multi-class contextual} packing problem
3: % % in presence of
4: % with
5: % bandit
6: % feedback % when
7: % \gi{where
8: % % experimental cost
9: % reward
10: % and resource consumption % depends on
11: % % both
12: % for an action % is both
13: % % class % arms
14: % % context dependent.
15: % is a class-dependent linear function of the context.}
16: % \gi{We propose a new % efficient
17: % computationally efficient estimator % for which yields
18: % for the reward and resource consumption function with
19: % a faster convergence rate that results a lower regret.}
20: % % for estimated rewards and a
21: % % We also propose
22: % % computationally
23: % % efficient algorithm with a
24: % Our proposed bandit policy is available in closed form.
25: % % closed-form of the optimal policy.
26: % % With
27: % % the efficient estimator and the optimal policy,
28: % \gi{We show that the proposed policy and the new estimator
29: % % algorithm
30: % achieve sublinear regret with respect to % the dimension
31: % % of contexts,
32: % context dimension~$d$, number % of heterogeneous sub-group
33: % classes~$J$, and the
34: % time horizon $T$. % under non-degenerate stochastic contexts.
35: % % Finally,
36: % The results of our numerical experiments clearly show that the performance
37: % of our
38: % % we demonstrate the superiority of our bandit
39: % policy is clearly superior to % over
40: % other candidate policies.}
41: % % based on a suite of numerical experiments.
42: % \end{abstract}
43: