abstract:02dd14bc1ac1c894.tex

1: \begin{abstract}

2: %   We consider the \gi{multi-period multi-class contextual} packing problem

3: %   % in presence of

4: %   with

5: %   bandit

6: %   feedback % when

7: %   \gi{where

8: %   % experimental cost

9: %   reward

10: %   and resource consumption % depends on

11: %   % both

12: %   for an action % is both

13: %   % class % arms

14: %   % context dependent.

15: %   is a class-dependent  linear function of the context.}

16: %   \gi{We propose a new % efficient

17: %   computationally efficient estimator % for  which yields

18: %   for the reward and resource consumption function with

19: %   a faster convergence rate that results a lower regret.}

20: % % for estimated rewards and a

21: % % We also propose

22: % % computationally

23: % % efficient algorithm with a

24: % Our proposed bandit  policy is available in closed form.

25: % % closed-form of the optimal policy.

26: % % With

27: % % the efficient estimator and the optimal policy,

28: % \gi{We show that the proposed policy and the new estimator

29: % % algorithm

30: % achieve sublinear regret with respect to % the dimension

31: % % of contexts,

32: % context dimension~$d$, number % of heterogeneous sub-group

33: % classes~$J$, and the

34: % time horizon $T$. % under non-degenerate stochastic contexts.

35: % % Finally,

36: % The results of our numerical experiments clearly show that the performance

37: % of our

38: % % we demonstrate the superiority of our bandit

39: % policy is clearly superior to % over

40: % other candidate policies.}

41: % % based on a suite of numerical experiments.

42: % \end{abstract}

43: