Recent research has shown that artificial intelligence (AI) models can exhibit bias in performance when trained using data that are imbalanced by protected attribute(s). Most work to date has focused on deep learning models, but classical AI techniques that make use of hand-crafted features may also be susceptible to such bias. In this paper we investigate the potential for race bias in random forest (RF) models trained using radiomics features. Our application is prediction of tumour molecular subtype from dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) of breast cancer patients. Our results show that radiomics features derived from DCE-MRI data do contain race-identifiable information, and that RF models can be trained to predict White and Black race from these data with 60-70% accuracy, depending on the subset of features used. Furthermore, RF models trained to predict tumour molecular subtype using race-imbalanced data seem to produce biased behaviour, exhibiting better performance on test data from the race on which they were trained.
翻译:近期研究表明,当使用受保护属性分布不均衡的数据训练时,人工智能模型可能表现出性能偏差。现有研究大多聚焦于深度学习模型,但采用人工设计特征的经典人工智能技术同样可能存在此类偏差。本文研究了基于影像组学特征训练的随机森林模型中潜在的种族偏差问题。我们以乳腺癌患者动态对比增强磁共振成像(DCE-MRI)数据预测肿瘤分子亚型为应用场景。结果显示:DCE-MRI数据衍生的影像组学特征确实包含可识别种族的特征信息,基于这些数据训练的随机森林模型能以60-70%的准确率预测白种人和黑种人族群(具体准确率取决于使用的特征子集)。此外,使用种族分布不均衡数据训练的肿瘤分子亚型预测随机森林模型表现出偏差行为——对训练种族所属测试数据的预测性能更优。