Adversarial attacks pose significant threats to the reliability and safety of deep learning models, especially in critical domains such as medical imaging. This paper introduces a novel framework that integrates conformal prediction with game-theoretic defensive strategies to enhance model robustness against both known and unknown adversarial perturbations. We address three primary research questions: constructing valid and efficient conformal prediction sets under known attacks (RQ1), ensuring coverage under unknown attacks through conservative thresholding (RQ2), and determining optimal defensive strategies within a zero-sum game framework (RQ3). Our methodology involves training specialized defensive models against specific attack types and employing maximum and minimum classifiers to aggregate defenses effectively. Extensive experiments conducted on the MedMNIST datasets, including PathMNIST, OrganAMNIST, and TissueMNIST, demonstrate that our approach maintains high coverage guarantees while minimizing prediction set sizes. The game-theoretic analysis reveals that the optimal defensive strategy often converges to a singular robust model, outperforming uniform and simple strategies across all evaluated datasets. This work advances the state-of-the-art in uncertainty quantification and adversarial robustness, providing a reliable mechanism for deploying deep learning models in adversarial environments.
翻译:对抗攻击对深度学习模型的可靠性与安全性构成重大威胁,在医学影像等关键领域尤为突出。本文提出一种新颖框架,将保形预测与博弈论防御策略相结合,以增强模型对已知与未知对抗扰动的鲁棒性。我们针对三个核心研究问题展开研究:在已知攻击下构建有效且高效的保形预测集(RQ1);通过保守阈值方法确保未知攻击下的覆盖度(RQ2);在零和博弈框架中确定最优防御策略(RQ3)。我们的方法包括针对特定攻击类型训练专用防御模型,并采用最大与最小分类器实现防御策略的有效集成。在MedMNIST数据集(包括PathMNIST、OrganAMNIST和TissueMNIST)上进行的大量实验表明,该方法在保持高覆盖度保证的同时,能最小化预测集规模。博弈论分析揭示,最优防御策略通常收敛于单一鲁棒模型,在所有评估数据集上均优于均匀策略与简单策略。本研究推动了不确定性量化与对抗鲁棒性领域的前沿发展,为在对抗环境中部署深度学习模型提供了可靠机制。