The proliferation of artificial intelligence (AI) in radiology has shed light on the risk of deep learning (DL) models exacerbating clinical biases towards vulnerable patient populations. While prior literature has focused on quantifying biases exhibited by trained DL models, demographically targeted adversarial bias attacks on DL models and its implication in the clinical environment remains an underexplored field of research in medical imaging. In this work, we demonstrate that demographically targeted label poisoning attacks can introduce adversarial underdiagnosis bias in DL models and degrade performance on underrepresented groups without impacting overall model performance. Moreover, our results across multiple performance metrics and demographic groups like sex, age, and their intersectional subgroups indicate that a group's vulnerability to undetectable adversarial bias attacks is directly correlated with its representation in the model's training data.
翻译:人工智能在放射学领域的广泛应用揭示了深度学习模型可能加剧对弱势患者群体的临床偏见风险。尽管先前研究主要关注量化已训练深度学习模型表现出的偏见,但针对人口统计特征的对抗性偏见攻击及其对临床环境的影响,在医学影像领域仍是一个尚待深入探索的研究课题。本研究证明,针对人口统计特征的标签投毒攻击可在不影响整体模型性能的情况下,造成深度学习模型出现对抗性漏诊偏见,并降低其在代表性不足群体中的表现。此外,我们在多种性能指标及性别、年龄等人口统计群体及其交叉亚组中的结果表明,群体对不可检测对抗性偏见攻击的脆弱性与其在模型训练数据中的代表性直接相关。