Deep Neural Networks (DNNs) have recently made significant progress in many fields. However, studies have shown that DNNs are vulnerable to adversarial examples, where imperceptible perturbations can greatly mislead DNNs even if the full underlying model parameters are not accessible. Various defense methods have been proposed, such as feature compression and gradient masking. However, numerous studies have proven that previous methods create detection or defense against certain attacks, which renders the method ineffective in the face of the latest unknown attack methods. The invisibility of adversarial perturbations is one of the evaluation indicators for adversarial example attacks, which also means that the difference in the local correlation of high-frequency information in adversarial examples and normal examples can be used as an effective feature to distinguish the two. Therefore, we propose an adversarial example detection framework based on a high-frequency information enhancement strategy, which can effectively extract and amplify the feature differences between adversarial examples and normal examples. Experimental results show that the feature augmentation module can be combined with existing detection models in a modular way under this framework. Improve the detector's performance and reduce the deployment cost without modifying the existing detection model.
翻译:深度神经网络(DNNs)近年来在许多领域取得了显著进展。然而,研究表明,DNNs容易受到对抗样本的攻击,即使无法获取完整的底层模型参数,难以察觉的扰动也能极大地误导DNNs。目前已提出多种防御方法,例如特征压缩和梯度掩蔽。但大量研究证明,以往的方法只能针对特定攻击进行检测或防御,导致在面对最新的未知攻击方法时失效。对抗扰动的不可见性是对抗样本攻击的评估指标之一,这也意味着对抗样本与正常样本在高频信息局部相关性上的差异可作为区分两者的有效特征。因此,我们提出了一种基于高频信息增强策略的对抗样本检测框架,该框架能够有效提取并放大对抗样本与正常样本之间的特征差异。实验结果表明,在此框架下,特征增强模块可以模块化方式与现有检测模型结合,在不修改现有检测模型的前提下提升检测器性能并降低部署成本。