1: \begin{abstract}
2: \begin{singlespace}
3: Reinforcement Learning (RL) has the promise of providing data-driven support for decision-making in a wide range of problems in healthcare, education, business, and other domains.
4: Classical RL methods focus on the mean of the total return and, thus, may provide misleading results in the setting of the heterogeneous populations that commonly underlie large-scale datasets. We introduce the {\MODELFULL} ({\MODEL}) to address sequential decision problems with population heterogeneity.
5: We propose the {\PEFULL} ({\PE}) for estimating the value of a given policy, and the {\PIFULL} ({\PI}) for estimating the optimal policy in a given policy class.
6: Our auto-clustered algorithms can automatically detect and identify homogeneous sub-populations, while estimating the $Q$ function and the optimal policy for each sub-population.
7: We establish convergence rates and construct confidence intervals for the estimators obtained by the {\PE} and {\PI}.
8: We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
9: The latter analysis show evidence of value heterogeneity and confirms the advantages of our new method.
10: \end{singlespace}
11: \end{abstract}
12: