The regression discontinuity (RD) design is widely used for program evaluation with observational data. The primary focus of the existing literature has been the estimation of the local average treatment effect at the existing treatment cutoff. In contrast, we consider policy learning under the RD design. Because the treatment assignment mechanism is deterministic, learning better treatment cutoffs requires extrapolation. We develop a robust optimization approach to finding optimal treatment cutoffs that improve upon the existing ones. We first decompose the expected utility into point-identifiable and unidentifiable components. We then propose an efficient doubly-robust estimator for the identifiable parts. To account for the unidentifiable components, we leverage the existence of multiple cutoffs that are common under the RD design. Specifically, we assume that the heterogeneity in the conditional expectations of potential outcomes across different groups vary smoothly along the running variable. Under this assumption, we minimize the worst case utility loss relative to the status quo policy. The resulting new treatment cutoffs have a safety guarantee that they will not yield a worse overall outcome than the existing cutoffs. Finally, we establish the asymptotic regret bounds for the learned policy using semi-parametric efficiency theory. We apply the proposed methodology to empirical and simulated data sets.
翻译:断点回归(RD)设计广泛应用于基于观测数据的项目评估。现有文献主要关注现有处理断点处局部平均处理效应的估计。相比之下,我们考虑RD设计下的政策学习问题。由于处理分配机制是确定性的,学习更优的处理断点需要进行外推。我们提出了一种鲁棒优化方法,用于寻找优于现有断点的最优处理断点。我们首先将期望效用分解为可识别和不可识别两个部分。然后,我们针对可识别部分提出了一种高效的双稳健估计量。为了处理不可识别部分,我们利用了RD设计中常见的多个断点存在性。具体而言,我们假设不同组别中潜在结果的条件期望异质性沿运行变量平滑变化。在此假设下,我们最小化相对于现状政策的最坏情况效用损失。由此产生的新处理断点具有安全保证,即不会导致比现有断点更差的整体结果。最后,我们利用半参数效率理论建立了所学策略的渐近遗憾界。我们将所提出的方法应用于实证和模拟数据集。