Healthcare data in the United States often records only a patient's coarse race group: for example, both Indian and Chinese patients are typically coded as ``Asian.'' It is unknown, however, whether this coarse coding conceals meaningful disparities in the performance of clinical risk scores across granular race groups. Here we show that it does. Using data from 418K emergency department visits, we assess clinical risk score performance disparities across granular race groups for three outcomes, five risk scores, and four performance metrics. Across outcomes and metrics, we show that there are significant granular disparities in performance within coarse race categories. In fact, variation in performance metrics within coarse groups often exceeds the variation between coarse groups. We explore why these disparities arise, finding that outcome rates, feature distributions, and the relationships between features and outcomes all vary significantly across granular race categories. Our results suggest that healthcare providers, hospital systems, and machine learning researchers should strive to collect, release, and use granular race data in place of coarse race data, and that existing analyses may significantly underestimate racial disparities in performance.
翻译:美国医疗保健数据通常仅记录患者的粗略种族群体:例如,印度和中国患者通常被编码为"亚洲人"。然而,目前尚不清楚这种粗略编码是否掩盖了更细粒度种族群体间临床风险评分表现中的显著差异。我们在此证明确实如此。基于41.8万次急诊就诊数据,我们评估了三种结局、五种风险评分和四种表现指标下细粒度种族群体间的临床风险评分表现差异。研究表明,在粗分类别内部存在显著的表现差异。事实上,粗分类别内的表现指标变异往往超过粗分类别间的变异。我们探讨了这些差异产生的原因,发现结局发生率、特征分布以及特征与结局之间的关系在不同细粒度种族类别间均存在显著差异。我们的结果表明,医疗提供者、医院系统和机器学习研究者应力求收集、发布和使用细粒度种族数据以替代粗糙种族数据,且现有分析可能严重低估了种族间表现差异。