Public policies and medical interventions often involve dynamics in their treatment assignments, where individuals receive a series of interventions over multiple stages. We study the statistical learning of optimal dynamic treatment regimes (DTRs) that guide the optimal treatment assignment for each individual at each stage based on the individual's evolving history. We propose a doubly robust, classification-based approach to learning the optimal DTR using observational data under the assumption of sequential ignorability. This approach learns the optimal DTR through backward induction. At each step, it constructs an augmented inverse probability weighting (AIPW) estimator of the policy value function and maximizes it to learn the optimal policy for the corresponding stage. We show that the resulting DTR can achieve an optimal convergence rate of $n^{-1/2}$ for welfare regret under mild convergence conditions on estimators of the nuisance components.
翻译:公共政策和医疗干预往往涉及治疗分配的动态过程,其中个体在多个阶段接受一系列干预措施。我们研究最优动态治疗策略(DTRs)的统计学习问题,该策略根据个体不断演变的病史,指导每个阶段为每个个体分配最优治疗方案。在序列可忽略性假设下,我们提出一种基于双重稳健分类的方法,利用观测数据学习最优DTR。该方法通过反向归纳学习最优DTR:在每一步中,构建策略值函数的增强逆概率加权(AIPW)估计量,并通过最大化该估计量学习对应阶段的最优策略。我们证明,在干扰成分估计量满足温和收敛条件下,所得DTR的福利遗憾可达到$n^{-1/2}$的最优收敛速率。