abstract:d5ec60a82513bc84.tex

1: \begin{abstract}

2: \begin{singlespace}

3:     Reinforcement Learning (RL) has the promise of providing data-driven support for decision-making in a wide range of problems in healthcare, education, business, and other domains.

4:     Classical RL methods focus on the mean of the total return and, thus, may provide misleading results in the setting of the heterogeneous populations that commonly underlie large-scale datasets. We introduce the {\MODELFULL} ({\MODEL}) to address sequential decision problems with population heterogeneity.

5:     We propose the {\PEFULL} ({\PE}) for estimating the value of a given policy, and the {\PIFULL} ({\PI}) for estimating the optimal policy in a given policy class.

6:     Our auto-clustered algorithms can automatically detect and identify homogeneous sub-populations, while estimating the $Q$ function and the optimal policy for each sub-population.

7:     We establish convergence rates and construct confidence intervals for the estimators obtained by the {\PE} and {\PI}.

8:     We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.

9:     The latter analysis show evidence of value heterogeneity and confirms the advantages of our new method.

10: \end{singlespace}

11: \end{abstract}

12: