In the high-stakes realm of healthcare, ensuring fairness in predictive models is crucial. Electronic Health Records (EHRs) have become integral to medical decision-making, yet existing methods for enhancing model fairness restrict themselves to unimodal data and fail to address the multifaceted social biases intertwined with demographic factors in EHRs. To mitigate these biases, we present FairEHR-CLP: a general framework for Fairness-aware Clinical Predictions with Contrastive Learning in EHRs. FairEHR-CLP operates through a two-stage process, utilizing patient demographics, longitudinal data, and clinical notes. First, synthetic counterparts are generated for each patient, allowing for diverse demographic identities while preserving essential health information. Second, fairness-aware predictions employ contrastive learning to align patient representations across sensitive attributes, jointly optimized with an MLP classifier with a softmax layer for clinical classification tasks. Acknowledging the unique challenges in EHRs, such as varying group sizes and class imbalance, we introduce a novel fairness metric to effectively measure error rate disparities across subgroups. Extensive experiments on three diverse EHR datasets on three tasks demonstrate the effectiveness of FairEHR-CLP in terms of fairness and utility compared with competitive baselines. FairEHR-CLP represents an advancement towards ensuring both accuracy and equity in predictive healthcare models.
翻译:在医疗这一高风险领域,确保预测模型的公平性至关重要。电子健康记录(EHR)已成为医疗决策的核心组成部分,然而现有的模型公平性增强方法局限于单模态数据,未能解决EHR中人口统计因素所交织的多维社会偏见。为缓解这些偏见,我们提出FairEHR-CLP:一个面向电子健康记录中公平感知临床预测的通用对比学习框架。该框架采用两阶段流程,整合患者人口统计学信息、纵向数据及临床病历。首先,为每位患者生成合成对照样本,在保留关键健康信息的同时实现多样化的人口统计属性。其次,利用对比学习范式进行公平感知预测,使患者表征在敏感属性间对齐,并与含softmax层的多层感知机分类器联合优化以完成临床分类任务。针对EHR中群体规模差异与类别不平衡等独特挑战,我们提出一种新型公平性指标,有效衡量各子组间的错误率差异。在三个不同EHR数据集上针对三项任务的广泛实验表明,与现有基线方法相比,FairEHR-CLP在公平性与效用性方面均表现优越。该框架标志着在确保预测性医疗模型准确性与公平性方面的重要进展。