Scientists often want to explain why an outcome is different in two groups. For instance, differences in patient mortality rates across two hospitals could be due to differences in the patients themselves (covariates) or differences in medical care (outcomes given covariates). The Oaxaca--Blinder decomposition (OBD) is a standard tool to tease apart these factors. It is well known that the OBD requires choosing one of the groups as a reference, and the numerical answer can vary with the reference. To the best of our knowledge, there has not been a systematic investigation into whether the choice of OBD reference can yield different substantive conclusions and how common this issue is. In the present paper, we give existence proofs in real and simulated data that the OBD references can yield substantively different conclusions and that these differences are not entirely driven by model misspecification or small data. We prove that substantively different conclusions occur in up to half of the parameter space, but find these discrepancies rare in the real-data analyses we study. We explain this empirical rarity by examining how realistic data-generating processes can be biased towards parameters that do not change conclusions under the OBD.
翻译:科学家常需解释两组结果差异的原因。例如,两家医院患者死亡率差异可能源于患者自身差异(协变量)或医疗质量差异(给定协变量后的结果差异)。Oaxaca-Blinder分解法是分离这些因素的标准工具。众所周知,该方法需选择一组作为参照组,且数值结果会随参照组变化而变化。据我们所知,目前尚无系统研究探讨参照组选择是否会导致实质性结论差异及其普遍性。本文通过真实数据与模拟数据证明:不同参照组可产生实质性差异结论,且这些差异并非完全由模型误设或小样本数据导致。我们证明在高达半数的参数空间中存在结论质变,但在所研究的真实数据分析中较少发现此类矛盾。我们通过考察真实数据生成过程如何倾向于使参数偏向不改变OBD结论的方向,解释了这种实证罕见性。