Intersectional biases in healthcare data can produce compound disparities in clinical machine learning models, yet most fairness evaluations assess demographic attributes independently. FairLogue, a toolkit for intersectional fairness auditing, was applied across multiple clinical prediction tasks to evaluate disparities across combined demographic groups. Using the All of Us dataset, two published models were selected for replication and evaluation: (A) prediction of selective serotonin reuptake inhibitor associated bleeding events and (B) two-year stroke risk in patients with atrial fibrillation. Observational fairness metrics were computed across race, gender, and intersectional subgroups, followed by counterfactual analysis to evaluate whether disparities were attributable to group membership. Intersectional evaluation revealed larger disparities than single-axis analyses; however, counterfactual diagnostics indicated that most observed disparities were comparable to those expected under randomized group membership. These results highlight the importance of intersectional fairness auditing and demonstrate how FairLogue provides deeper insight into bias in clinical machine learning systems.
翻译:医疗数据中的交叉偏差可能在临床机器学习模型中产生复合性差异,然而多数公平性评估仅独立分析人口统计属性。FairLogue是一种用于交叉公平性审计的工具包,本研究将其应用于多项临床预测任务,以评估跨组合人口群体的差异。基于All of Us数据集,我们选择了两项已发表的模型进行复现与评估:(A)选择性5-羟色胺再摄取抑制剂相关出血事件的预测,以及(B)房颤患者两年卒中风险预测。我们计算了基于种族、性别及其交叉亚组的观察性公平性指标,随后通过反事实分析评估差异是否归因于群体成员身份。交叉评估揭示的差异幅度大于单轴分析;然而,反事实诊断表明,多数观察到的差异与随机群体成员身份下预期的差异相当。这些结果凸显了交叉公平性审计的重要性,并证明FairLogue如何为临床机器学习系统中的偏差提供更深入的洞见。