Social determinants of health (SDOH) -- the conditions in which people live, grow, and age -- play a crucial role in a person's health and well-being. There is a large, compelling body of evidence in population health studies showing that a wide range of SDOH is strongly correlated with health outcomes. Yet, a majority of the risk prediction models based on electronic health records (EHR) do not incorporate a comprehensive set of SDOH features as they are often noisy or simply unavailable. Our work links a publicly available EHR database, MIMIC-IV, to well-documented SDOH features. We investigate the impact of such features on common EHR prediction tasks across different patient populations. We find that community-level SDOH features do not improve model performance for a general patient population, but can improve data-limited model fairness for specific subpopulations. We also demonstrate that SDOH features are vital for conducting thorough audits of algorithmic biases beyond protective attributes. We hope the new integrated EHR-SDOH database will enable studies on the relationship between community health and individual outcomes and provide new benchmarks to study algorithmic biases beyond race, gender, and age.
翻译:健康的社会决定因素(SDOH)——即人们生活、成长和衰老的环境——在个人健康和福祉中起着至关重要的作用。人口健康研究中存在大量令人信服的证据表明,广泛的SDOH与健康结果密切相关。然而,基于电子健康记录(EHR)的大部分风险预测模型并未纳入全面的SDOH特征,因为这些特征通常存在噪声或根本不可用。我们的工作将公开可用的EHR数据库MIMIC-IV与记录完善的SDOH特征相连接。我们研究了这些特征对不同患者群体常见EHR预测任务的影响。我们发现,社区层面的SDOH特征并未改善一般患者群体的模型性能,但可以改善特定亚群数据受限情况下的模型公平性。我们还证明,SDOH特征对于超越受保护属性的算法偏见进行全面审计至关重要。我们希望新的集成EHR-SDOH数据库能够促进对社区健康与个体结果之间关系的研究,并为研究超越种族、性别和年龄的算法偏见提供新的基准。