Adversarial Medical Image with Hierarchical Feature Hiding

Deep learning based methods for medical images can be easily compromised by adversarial examples (AEs), posing a great security flaw in clinical decision-making. It has been discovered that conventional adversarial attacks like PGD which optimize the classification logits, are easy to distinguish in the feature space, resulting in accurate reactive defenses. To better understand this phenomenon and reassess the reliability of the reactive defenses for medical AEs, we thoroughly investigate the characteristic of conventional medical AEs. Specifically, we first theoretically prove that conventional adversarial attacks change the outputs by continuously optimizing vulnerable features in a fixed direction, thereby leading to outlier representations in the feature space. Then, a stress test is conducted to reveal the vulnerability of medical images, by comparing with natural images. Interestingly, this vulnerability is a double-edged sword, which can be exploited to hide AEs. We then propose a simple-yet-effective hierarchical feature constraint (HFC), a novel add-on to conventional white-box attacks, which assists to hide the adversarial feature in the target feature distribution. The proposed method is evaluated on three medical datasets, both 2D and 3D, with different modalities. The experimental results demonstrate the superiority of HFC, \emph{i.e.,} it bypasses an array of state-of-the-art adversarial medical AE detectors more efficiently than competing adaptive attacks, which reveals the deficiencies of medical reactive defense and allows to develop more robust defenses in future.

翻译：基于深度学习的医学图像方法易受对抗样本（AEs）攻击，这给临床决策带来了重大安全隐患。已有研究发现，传统的对抗攻击方法（如优化分类logits的PGD攻击）在特征空间中易被识别，从而使得反应式防御能够有效检测。为了深入理解这一现象并重新评估针对医学AEs的反应式防御的可靠性，我们系统地研究了传统医学AEs的特征。具体而言，我们首先从理论上证明：传统对抗攻击通过沿固定方向持续优化脆弱特征来改变输出结果，从而在特征空间中产生异常表征。随后，通过与自然图像的对比，我们通过压力测试揭示了医学图像的脆弱性。有趣的是，这种脆弱性是一把双刃剑——它同样可以被利用来隐藏AEs。基于此，我们提出一种简单而有效的层级特征约束（HFC）方法，作为传统白盒攻击的新型附加模块，能够将对抗特征隐藏于目标特征分布中。该方法在三个不同模态的医学数据集（涵盖2D和3D图像）上进行了评估。实验结果表明，HFC具有显著优势：相比其他自适应攻击方法，它能更高效地规避当前最先进的医学AE检测器，这揭示了医学图像反应式防御存在的缺陷，并为未来开发更鲁棒的防御方法提供了依据。