The proliferation of artificial intelligence (AI) in radiology has shed light on the risk of deep learning (DL) models exacerbating clinical biases towards vulnerable patient populations. While prior literature has focused on quantifying biases exhibited by trained DL models, demographically targeted adversarial bias attacks on DL models and its implication in the clinical environment remains an underexplored field of research in medical imaging. In this work, we demonstrate that demographically targeted label poisoning attacks can introduce undetectable underdiagnosis bias in DL models. Our results across multiple performance metrics and demographic groups like sex, age, and their intersectional subgroups show that adversarial bias attacks demonstrate high-selectivity for bias in the targeted group by degrading group model performance without impacting overall model performance. Furthermore, our results indicate that adversarial bias attacks result in biased DL models that propagate prediction bias even when evaluated with external datasets.
翻译:人工智能在放射学中的普及揭示了深度学习模型可能加剧对弱势患者群体临床偏见的风险。尽管现有文献聚焦于量化已训练深度学习模型所展现的偏见,但针对人口统计学属性的对抗性偏见攻击及其在临床环境中的影响仍是医学影像研究中未充分探索的领域。本研究证明,针对人口统计属性的标签投毒攻击可在深度学习模型中引入不可察觉的漏诊偏见。我们在性别、年龄及其交叉亚组等多个性能指标和人口统计学群体上的结果表明,对抗性偏见攻击通过降低目标组模型性能而不影响整体模型性能,展现出对目标组偏见的高度选择性。此外,我们的结果表明,对抗性偏见攻击会导致有偏的深度学习模型,即便使用外部数据集进行评估时,仍会传播预测偏见。