The partitioning of data for estimation and calibration critically impacts the performance of propensity score based estimators like inverse probability weighting (IPW) and double/debiased machine learning (DML) frameworks. We extend recent advances in calibration techniques for propensity score estimation, improving the robustness of propensity scores in challenging settings such as limited overlap, small sample sizes, or unbalanced data. Our contributions are twofold: First, we provide a theoretical analysis of the properties of calibrated estimators in the context of DML. To this end, we refine existing calibration frameworks for propensity score models, with a particular emphasis on the role of sample-splitting schemes in ensuring valid causal inference. Second, through extensive simulations, we show that calibration reduces variance of inverse-based propensity score estimators while also mitigating bias in IPW, even in small-sample regimes. Notably, calibration improves stability for flexible learners (e.g., gradient boosting) while preserving the doubly robust properties of DML. A key insight is that, even when methods perform well without calibration, incorporating a calibration step does not degrade performance, provided that an appropriate sample-splitting approach is chosen.
翻译:数据在估计与校准阶段的分割方式对基于倾向得分的估计器(如逆概率加权(IPW)与双稳健/去偏机器学习(DML)框架)的性能具有关键影响。我们扩展了近期倾向得分估计中的校准技术进展,提升了倾向得分在有限重叠、小样本或数据不平衡等挑战性场景下的稳健性。我们的贡献主要体现在两个方面:首先,我们在DML框架下对校准后估计量的性质进行了理论分析。为此,我们改进了现有的倾向得分模型校准框架,特别强调了样本分割方案在确保因果推断有效性中的作用。其次,通过大量模拟实验,我们证明校准能够降低基于逆概率的倾向得分估计量的方差,同时缓解IPW估计的偏差,即使在小样本情况下也成立。值得注意的是,校准在保持DML双重稳健性质的同时,提升了灵活学习器(如梯度提升)的稳定性。一个关键发现是:即使方法在不校准时表现良好,只要选择合适的样本分割策略,引入校准步骤也不会降低其性能。