Background: Inverse probability of treatment weighting (IPTW) is used for confounding adjustment in observational studies. Newer weighting methods include energy balancing (EB), kernel optimal matching (KOM), and tailored-loss covariate balancing propensity scores (TLF), but practical guidance remains limited. We evaluate their performance when implemented according to published recommendations. Methods: We conducted Monte Carlo simulations across 36 scenarios varying sample size, treatment prevalence, and a complexity factor increasing confounding and reducing overlap. Data generation used predominantly categorical covariates with some correlation. Average treatment effect and average treatment effect on the treated were estimated using IPTW, EB, KOM, and TLF combined with weighted least squares and, when supported, a doubly robust (DR) estimators. Inference followed published recommendations for each method when feasible, using standard alternatives otherwise. \textsc{PROBITsim} dataset used for illustration. Results: DR reduced sensitivity to the weighting scheme with an outcome regression adjusted for all confounders, despite functional-form misspecification. EB and KOM were most reliable; EB was tuning-free but scale dependent, whereas KOM required kernel and penalty choices. IPTW was variance sensitive when treatment prevalence was far from 50\%. TLF traded lower variance for higher bias, producing an RMSE plateau and sub-nominal confidence interval coverage. \textsc{PROBITsim} results mirrored these patterns. Conclusions: Rather than identifying a best method, our findings highlight failure modes and tuning choices to monitor. When the outcome regression adjusts for all confounders, DR estimation can be dependable across weighting schemes. Incorporating weight-estimation uncertainty into confidence intervals remains a key challenge for newer approaches.
翻译:背景:在观察性研究中,逆概率加权(IPTW)被用于混杂因素调整。较新的加权方法包括能量平衡(EB)、核最优匹配(KOM)和定制损失协变量平衡倾向得分(TLF),但实际应用指导仍然有限。我们评估了这些方法在根据已发表建议实施时的性能。方法:我们在36种模拟场景中进行了蒙特卡洛模拟,这些场景在样本量、处理流行度以及一个增加混杂并减少重叠的复杂度因子方面存在差异。数据生成主要使用了具有某些相关性的分类协变量。使用IPTW、EB、KOM和TLF结合加权最小二乘法,并在支持时结合双重稳健(DR)估计量,估计了平均处理效应和接受处理者的平均处理效应。在可行的情况下,推理遵循了每种方法的已发表建议,否则使用标准替代方案。使用 \textsc{PROBITsim} 数据集进行说明。结果:尽管存在函数形式误设,但通过一个调整了所有混杂因素的结果回归,DR降低了对加权方案的敏感性。EB和KOM最为可靠;EB无需调参但具有尺度依赖性,而KOM则需要选择核函数和惩罚项。当处理流行度远离50\%时,IPTW对方差敏感。TLF以更高的偏差换取更低的方差,导致均方根误差(RMSE)出现平台期以及置信区间覆盖率低于名义水平。\textsc{PROBITsim} 的结果反映了这些模式。结论:我们的研究结果并未确定一种最佳方法,而是强调了需要监控的失效模式和调参选择。当结果回归调整了所有混杂因素时,DR估计可以在不同的加权方案中保持可靠。将权重估计的不确定性纳入置信区间,仍然是新方法面临的一个关键挑战。