Healthcare data in the United States often records only a patient's coarse race group: for example, both Indian and Chinese patients are typically coded as "Asian." It is unknown, however, whether this coarse coding conceals meaningful disparities in the performance of clinical risk scores across granular race groups. Here we show that it does. Using data from 418K emergency department visits, we assess clinical risk score performance disparities across 26 granular groups for three outcomes, five risk scores, and four performance metrics. Across outcomes and metrics, we show that the risk scores exhibit significant granular performance disparities within coarse race groups. In fact, variation in performance within coarse groups often *exceeds* the variation between coarse groups. We explore why these disparities arise, finding that outcome rates, feature distributions, and the relationships between features and outcomes all vary significantly across granular groups. Our results suggest that healthcare providers, hospital systems, and machine learning researchers should strive to collect, release, and use granular race data in place of coarse race data, and that existing analyses may significantly underestimate racial disparities in performance.
翻译:美国医疗保健数据通常仅记录患者的粗粒度种族群体,例如,印度和中国患者通常被归类为"亚裔"。然而,这种粗粒度编码是否会掩盖临床风险评分在不同细粒度种族群体间表现的实质性差异尚不明确。本研究证实了该现象确实存在。通过分析41.8万次急诊就诊数据,我们评估了3种结局、5种风险评分及4种性能指标在26个细粒度群体间的表现差异。结果表明,在粗粒度种族群体内部,这些风险评分在结局与指标方面展现出显著的细粒度表现差异。事实上,粗粒度组内的表现变异幅度通常*超过*粗粒度组间的变异。我们进一步探究了差异成因,发现结局发生率、特征分布以及特征与结局的关系在不同细粒度群体间均存在显著差异。研究建议医疗机构、医院系统及机器学习研究者应努力收集、发布并运用细粒度种族数据替代粗粒度数据,同时现有分析可能严重低估了种族间在模型表现方面的差异。