Propensity scores are commonly used to balance observed covariates while estimating treatment effects. Estimates obtained through propensity score weighing can be biased when the propensity score model cannot learn the true treatment assignment mechanism. We argue that the probabilistic output of a learned propensity score model should be calibrated, i.e. a predictive treatment probability of 90% should correspond to 90% of individuals being assigned the treatment group. We propose simple recalibration techniques to ensure this property. We investigate the theoretical properties of a calibrated propensity score model and its role in unbiased treatment effect estimation. We demonstrate improved causal effect estimation with calibrated propensity scores in several tasks including high-dimensional genome-wide association studies, where we also show reduced computational requirements when calibration is applied to simpler propensity score models.
翻译:倾向得分常用于在估计处理效应时平衡观测协变量。当倾向得分模型无法学习真实的处理分配机制时,通过倾向得分加权获得的估计可能存在偏倚。我们认为,学习得到的倾向得分模型的概率输出应具有校准性,即预测处理概率为90%时,实际应有90%的个体被分配至处理组。我们提出简单的再校准技术来确保这一性质。我们探究了校准倾向得分模型的理论性质及其在无偏处理效应估计中的作用。我们通过多个任务(包括高维全基因组关联研究)证明了使用校准倾向得分可改进因果效应估计,同时表明当校准应用于更简单的倾向得分模型时,计算需求亦有所降低。