We study whether deep networks for medical imaging learn useful nonrobust features - predictive input patterns that are not human interpretable and highly susceptible to small adversarial perturbations - and how these features impact test performance. We show that models trained only on nonrobust features achieve well above chance accuracy across five MedMNIST classification tasks, confirming their predictive value in-distribution. Conversely, adversarially trained models that primarily rely on robust features sacrifice in-distribution accuracy but yield markedly better performance under controlled distribution shifts (MedMNIST-C). Overall, nonrobust features boost standard accuracy yet degrade out-of-distribution performance, revealing a practical robustness-accuracy trade-off in medical imaging classification tasks that should be tailored to the requirements of the deployment setting.
翻译:我们研究了用于医学影像的深度网络是否学习了有用的非鲁棒特征——即具有预测性但难以被人理解且极易受微小对抗扰动影响的输入模式——以及这些特征如何影响测试性能。我们在五项MedMNIST分类任务上证明,仅基于非鲁棒特征训练的模型准确率显著高于随机水平,证实了其在分布内的预测价值。相反,主要依赖鲁棒特征的对抗训练模型会牺牲分布内准确率,但在受控分布偏移(MedMNIST-C)下表现显著更优。总体而言,非鲁棒特征虽能提升标准准确率,但会降低分布外性能,揭示了医学影像分类任务中鲁棒性与准确率的实际权衡问题,应根据部署场景的需求进行针对性调整。