Many policies involve dynamics in their treatment assignments, where individuals receive sequential interventions over multiple stages. We study estimation of an optimal dynamic treatment regime that guides the optimal treatment assignment for each individual at each stage based on their history. We propose an empirical welfare maximization approach in this dynamic framework, which estimates the optimal dynamic treatment regime using data from an experimental or quasi-experimental study while satisfying exogenous constraints on policies. The paper proposes two estimation methods: one solves the treatment assignment problem sequentially through backward induction, and the other solves the entire problem simultaneously across all stages. We establish finite-sample upper bounds on worst-case average welfare regrets for these methods and show their optimal $n^{-1/2}$ convergence rates. We also modify the simultaneous estimation method to accommodate intertemporal budget/capacity constraints.
翻译:许多政策涉及治疗分配的动态性,其中个体在多个阶段接受序贯干预。我们研究最优动态治疗方案的估计,该方案基于个体历史信息指导每个阶段对每个个体的最优治疗分配。在此动态框架下,我们提出一种经验福利最大化方法,该方法利用实验或准实验研究数据估计最优动态治疗方案,同时满足政策的外生约束。本文提出两种估计方法:一种通过逆向归纳法序贯求解治疗分配问题,另一种则同时求解所有阶段的整体问题。我们建立了这些方法在最坏情况下平均福利遗憾的有限样本上界,并证明了其最优的$n^{-1/2}$收敛速率。我们还对同时估计方法进行了修改,以适应跨期预算/容量约束。