3937e393ae120ae0.tex
1: \begin{abstract}
2: \begin{spacing}{1.2}
3:     We study statistical decisions for dynamic sequential treatment assignment problems. Many public policies and medical interventions involve dynamics in their treatment assignments where treatments are sequentially assigned to individuals across multiple stages, and the effect of treatment at each stage is usually heterogeneous with respect to history of the prior treatments, past outcomes, and observed characteristics. We consider estimation of optimal dynamic treatment regimes (DTRs) that guide the optimal treatment assignment for each individual at each stage based on the individual’s history. We propose sequential doubly-robust learning approach to estimate the optimal DTR using observational data under the sequential ignorability assumption. The approach solves the treatment assignment problem at each stage through backward induction, which leads to computational advantage over existing methods. The approach consistently estimates the optimal DTR if either propensity scores or stage-specific action value functions are correctly specified.
4:     Using doubly-robust estimators of treatment scores and cross-fitting, the approach can achieve the minimax optimal convergence rate of welfare regret even when nuisance components are nonparametrically estimated.\bigskip \\
5: \noindent 
6: \textbf{Keywords:} Dynamic treatment effect, dynamic treatment regime, double/debiased machine learning, policy learning\\
7:  \textbf{JEL codes:} C22, C44, C54
8: \end{spacing}
9: \end{abstract}
10: