检验粗化精确匹配作为倾向得分匹配替代方法的有效性 (Examining the Efficacy of Coarsen Exact Matching as an Alternative to Propensity Score Matching)

Coarsened exact matching (CEM) is often promoted as a superior alternative to propensity score matching (PSM) for addressing imbalance, model dependence, bias, and efficiency. However, this recommendation remains uncertain. First, CEM is commonly mischaracterized as exact matching, despite relying on coarsened rather than original variables. This inexactness in matching introduces residual confounding, which necessitates accurate modeling of the outcome-confounder relationship post-matching to mitigate bias, thereby increasing vulnerability to model misspecification. Second, prior studies overlook that any imbalance between treated and untreated subjects matched on the same propensity score is attributable to random variation. Thus, claims that CEM outperforms PSM in reducing imbalance are unfounded, particularly when using metrics like Mahalanobis distance, which do not account for chance imbalance in PSM. Our simulations show that PSM reduces imbalance more effectively than CEM when evaluated with multivariate standardized mean differences (SMD), and unadjusted analyses indicate greater bias with CEM. While adjusted analyses in both CEM with autocoarsening and PSM may perform similarly when matching on few variables, CEM suffers from the curse of dimensionality as the number of factors increases, resulting in substantial data loss and unstable estimates. Increasing the level of coarsening may mitigate data loss but exacerbates residual confounding and model dependence. In contrast, both analytical results and simulations demonstrate that PSM is more robust to model misspecification and thus less model-dependent. Therefore, CEM is not a viable alternative to PSM when matching on a large number of covariates.

翻译：粗化精确匹配（CEM）常被推崇为解决不平衡性、模型依赖性、偏倚和效率问题时优于倾向得分匹配（PSM）的替代方法。然而，这一建议仍存在不确定性。首先，CEM常被误称为精确匹配，尽管其依赖的是粗化变量而非原始变量。这种匹配的不精确性会引入残余混杂，需要通过匹配后对结局-混杂因素关系的精确建模来减轻偏倚，从而增加了模型设定错误的脆弱性。其次，先前研究忽略了基于相同倾向得分匹配的处理组与未处理组之间的任何不平衡均源于随机变异。因此，关于CEM在降低不平衡性方面优于PSM的说法缺乏依据，特别是当使用马氏距离等未考虑PSM中偶然不平衡性的度量指标时。我们的模拟研究表明，当使用多变量标准化均数差（SMD）评估时，PSM比CEM更有效地降低了不平衡性，且未经调整的分析表明CEM存在更大偏倚。虽然在使用自动粗化的CEM与PSM中，当匹配变量较少时调整后分析可能表现相似，但随着因素数量增加，CEM会遭遇维度灾难，导致大量数据丢失和估计不稳定。增加粗化程度可能减轻数据丢失，但会加剧残余混杂和模型依赖性。相反，分析结果和模拟实验均表明PSM对模型设定错误更具稳健性，因而模型依赖性更低。因此，当需要匹配大量协变量时，CEM并非PSM的可行替代方案。