Medical imaging cohorts are often confounded by factors such as acquisition devices, hospital sites, patient backgrounds, and many more. As a result, deep learning models tend to learn spurious correlations instead of causally related features, limiting their generalizability to new and unseen data. This problem can be addressed by minimizing dependence measures between intermediate representations of task-related and non-task-related variables. These measures include mutual information, distance correlation, and the performance of adversarial classifiers. Here, we benchmark such dependence measures for the task of preventing shortcut learning. We study a simplified setting using Morpho-MNIST and a medical imaging task with CheXpert chest radiographs. Our results provide insights into how to mitigate confounding factors in medical imaging.
翻译:医学影像队列常受到多种混杂因素影响,包括采集设备、医院站点、患者背景等。因此,深度学习模型倾向于学习虚假相关性而非因果关联特征,从而限制了其对新数据和未见数据的泛化能力。该问题可通过最小化任务相关变量与非任务相关变量的中间表示之间的依赖度度量来解决。这些度量包括互信息、距离相关性以及对抗分类器的性能表现。本研究针对防止捷径学习的任务,对这类依赖度度量进行了基准测试。我们通过简化的Morpho-MNIST数据集和基于CheXpert胸部X光片的医学影像任务开展实验。研究结果为如何减轻医学影像中的混杂因素提供了重要见解。