Does a Rising Tide Lift All Boats? Bias Mitigation for AI-based CMR Segmentation

Artificial intelligence (AI) is increasingly being used for medical imaging tasks. However, there can be biases in the resulting models, particularly when they were trained using imbalanced training datasets. One such example has been the strong race bias effect in cardiac magnetic resonance (CMR) image segmentation models. Although this phenomenon has been reported in a number of publications, little is known about the effectiveness of bias mitigation algorithms in this domain. We aim to investigate the impact of common bias mitigation methods to address bias between Black and White subjects in AI-based CMR segmentation models. Specifically, we use oversampling, importance reweighing and Group DRO as well as combinations of these techniques to mitigate the race bias. Furthermore, motivated by recent findings on the root causes of AI-based CMR segmentation bias, we evaluate the same methods using models trained and evaluated on cropped CMR images. We find that bias can be mitigated using oversampling, significantly improving performance for the underrepresented Black subjects whilst not significantly reducing the majority White subjects' performance. Group DRO also improves performance for Black subjects but not significantly, while reweighing decreases performance for Black subjects. Using a combination of oversampling and Group DRO also improves performance for Black subjects but not significantly. Using cropped images increases performance for both races and reduces the bias, whilst adding oversampling as a bias mitigation technique with cropped images reduces the bias further.

翻译：人工智能（AI）在医学影像任务中的应用日益广泛。然而，由此产生的模型可能存在偏见，尤其是在使用不平衡训练数据集进行训练时。心脏磁共振（CMR）图像分割模型中存在的强烈种族偏见效应便是其中一例。尽管已有若干文献报道了这一现象，但关于偏见缓解算法在该领域有效性的研究仍十分有限。本研究旨在探讨常见偏见缓解方法对基于AI的CMR分割模型中黑人与白人受试者间偏见的改善效果。具体而言，我们采用过采样、重要性重加权和Group DRO以及这些技术的组合来缓解种族偏见。此外，基于近期关于基于AI的CMR分割偏见根源的研究发现，我们使用在裁剪后的CMR图像上训练和评估的模型对相同方法进行了评估。研究发现，过采样可有效缓解偏见，显著提升代表性不足的黑人受试者的性能，同时未显著降低占多数的白人受试者的性能。Group DRO也能改善黑人受试者的性能，但效果不显著，而重加权则会降低黑人受试者的性能。过采样与Group DRO的组合同样能提升黑人受试者的性能，但改善幅度未达显著水平。使用裁剪图像可同时提升两个种族的性能并减少偏见，而将过采样作为偏见缓解技术与裁剪图像结合使用时，能进一步降低偏见程度。