d5ec60a82513bc84.tex
1: \begin{abstract}
2: \begin{singlespace}
3:     Reinforcement Learning (RL) has the promise of providing data-driven support for decision-making in a wide range of problems in healthcare, education, business, and other domains.
4:     Classical RL methods focus on the mean of the total return and, thus, may provide misleading results in the setting of the heterogeneous populations that commonly underlie large-scale datasets. We introduce the {\MODELFULL} ({\MODEL}) to address sequential decision problems with population heterogeneity.
5:     We propose the {\PEFULL} ({\PE}) for estimating the value of a given policy, and the {\PIFULL} ({\PI}) for estimating the optimal policy in a given policy class.
6:     Our auto-clustered algorithms can automatically detect and identify homogeneous sub-populations, while estimating the $Q$ function and the optimal policy for each sub-population.
7:     We establish convergence rates and construct confidence intervals for the estimators obtained by the {\PE} and {\PI}.
8:     We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset. 
9:     The latter analysis show evidence of value heterogeneity and confirms the advantages of our new method.
10: \end{singlespace}
11: \end{abstract}
12: