Social determinants of health (SDOH) -- the conditions in which people live, grow, and age -- play a crucial role in a person's health and well-being. There is a large, compelling body of evidence in population health studies showing that a wide range of SDOH is strongly correlated with health outcomes. Yet, a majority of the risk prediction models based on electronic health records (EHR) do not incorporate a comprehensive set of SDOH features as they are often noisy or simply unavailable. Our work links a publicly available EHR database, MIMIC-IV, to well-documented SDOH features. We investigate the impact of such features on common EHR prediction tasks across different patient populations. We find that community-level SDOH features do not improve model performance for a general patient population, but can improve data-limited model fairness for specific subpopulations. We also demonstrate that SDOH features are vital for conducting thorough audits of algorithmic biases beyond protective attributes. We hope the new integrated EHR-SDOH database will enable studies on the relationship between community health and individual outcomes and provide new benchmarks to study algorithmic biases beyond race, gender, and age.
翻译:健康的社会决定因素(SDOH)——即人们生活、成长和衰老的环境——在一个人的健康与福祉中起着至关重要的作用。人口健康研究中已有大量令人信服的证据表明,广泛的社会决定因素与健康结果密切相关。然而,大多数基于电子健康记录(EHR)的风险预测模型并未纳入全面的SDOH特征,因为这些特征往往存在噪声或根本不可用。我们的工作将公开可用的EHR数据库MIMIC-IV与有据可查的SDOH特征相关联。我们研究了这些特征对不同患者群体常见EHR预测任务的影响。我们发现,社区层面的SDOH特征并不会改善普通患者群体的模型性能,但可以针对特定子群体提高数据受限时的模型公平性。我们还证明,SDOH特征对于在保护属性之外对算法偏差进行彻底审查至关重要。我们希望新整合的EHR-SDOH数据库能够促进关于社区健康与个体结果之间关系的研究,并为超越种族、性别和年龄的算法偏差研究提供新的基准。