abstract:386cd895d50c2dbb.tex

1: \begin{abstract}

2:   % ----- importance of contextual bandit ------

3:   The contextual bandit problem is a theoretically justified framework with wide applications in various fields.

4:   % ------- limitation of previous work -----

5:   While the previous study on this problem usually requires independence between noise and contexts,

6:   % ------- our generalization -----

7:   our work considers a more sensible setting where the noise becomes a latent confounder that affects both contexts and rewards.

8:   % -------- the importance of this improvement ------

9:   Such a confounded setting is more realistic and could expand to a broader range of applications.

10:   % --------challenges ------------

11:   However, the unresolved confounder will cause a bias in reward function estimation and thus lead to a large regret.

12:   % -------- link ------------

13:   To deal with the challenges brought by the confounder,

14:   % --------- our method ------

15:   we apply the dual instrumental variable regression, which can correctly identify the true reward function.

16:   % --------- one theoretical result -------

17:   We prove the convergence rate of this method is near-optimal in two types of widely used reproducing kernel Hilbert spaces.

18:   % --------- about bandit -------

19:   Therefore, we can design computationally efficient and regret-optimal algorithms based on the theoretical guarantees for confounded bandit problems.

20:   % --------- numerical ---------

21:   The numerical results illustrate the efficacy of our proposed algorithms in the confounded bandit setting.

22: \end{abstract}

23: