We investigate performance disparities in deep classifiers. We find that the ability of classifiers to separate individuals into subgroups varies substantially across medical imaging modalities and protected characteristics; crucially, we show that this property is predictive of algorithmic bias. Through theoretical analysis and extensive empirical evaluation, we find a relationship between subgroup separability, subgroup disparities, and performance degradation when models are trained on data with systematic bias such as underdiagnosis. Our findings shed new light on the question of how models become biased, providing important insights for the development of fair medical imaging AI.
翻译:我们探究了深度分类器中的性能差异。研究发现,分类器将个体划分为子群的能力在不同医学影像模态和保护属性之间存在显著差异;关键的是,我们证明了这一特性可预测算法偏差。通过理论分析与广泛实验评估,我们发现当模型在存在系统性偏差(如漏诊)的数据上训练时,子群可分离性、子群差异与性能退化之间存在关联。我们的发现为模型如何产生偏差的问题提供了新见解,为开发公平的医学影像人工智能提供了重要启示。