Machine learning methods, particularly the double machine learning (DML) estimator (Chernozhukov et al., 2018), are increasingly popular for the estimation of the average treatment effect (ATE). However, datasets often exhibit unbalanced treatment assignments where only a few observations are treated, leading to unstable propensity score estimations. We propose a simple extension of the DML estimator which undersamples data for propensity score modeling and calibrates scores to match the original distribution. The paper provides theoretical results showing that the estimator retains the DML estimator's asymptotic properties. A simulation study illustrates the finite sample performance of the estimator.
翻译:机器学习方法,特别是双重机器学习(DML)估计量(Chernozhukov 等人,2018),在估计平均处理效应(ATE)方面日益流行。然而,数据集常呈现处理分配不平衡的情况,即仅有少数观测值接受了处理,这导致倾向得分估计不稳定。我们提出一种对 DML 估计量的简单扩展,该方法通过对倾向得分建模数据进行欠采样,并校准得分以匹配原始分布。本文提供了理论结果,表明该估计量保留了 DML 估计量的渐近性质。一项模拟研究展示了该估计量在有限样本下的性能。