1: \begin{abstract}
2: % ----- importance of contextual bandit ------
3: The contextual bandit problem is a theoretically justified framework with wide applications in various fields.
4: % ------- limitation of previous work -----
5: While the previous study on this problem usually requires independence between noise and contexts,
6: % ------- our generalization -----
7: our work considers a more sensible setting where the noise becomes a latent confounder that affects both contexts and rewards.
8: % -------- the importance of this improvement ------
9: Such a confounded setting is more realistic and could expand to a broader range of applications.
10: % --------challenges ------------
11: However, the unresolved confounder will cause a bias in reward function estimation and thus lead to a large regret.
12: % -------- link ------------
13: To deal with the challenges brought by the confounder,
14: % --------- our method ------
15: we apply the dual instrumental variable regression, which can correctly identify the true reward function.
16: % --------- one theoretical result -------
17: We prove the convergence rate of this method is near-optimal in two types of widely used reproducing kernel Hilbert spaces.
18: % --------- about bandit -------
19: Therefore, we can design computationally efficient and regret-optimal algorithms based on the theoretical guarantees for confounded bandit problems.
20: % --------- numerical ---------
21: The numerical results illustrate the efficacy of our proposed algorithms in the confounded bandit setting.
22: \end{abstract}
23: