Deep neural networks were significantly vulnerable to adversarial examples manipulated by malicious tiny perturbations. Although most conventional adversarial attacks ensured the visual imperceptibility between adversarial examples and corresponding raw images by minimizing their geometric distance, these constraints on geometric distance led to limited attack transferability, inferior visual quality, and human-imperceptible interpretability. In this paper, we proposed a supervised semantic-transformation generative model to generate adversarial examples with real and legitimate semantics, wherein an unrestricted adversarial manifold containing continuous semantic variations was constructed for the first time to realize a legitimate transition from non-adversarial examples to adversarial ones. Comprehensive experiments on MNIST and industrial defect datasets showed that our adversarial examples not only exhibited better visual quality but also achieved superior attack transferability and more effective explanations for model vulnerabilities, indicating their great potential as generic adversarial examples. The code and pre-trained models were available at https://github.com/shuaili1027/MAELS.git.
翻译:深度神经网络在面对恶意微小扰动操控的对抗样本时显得极为脆弱。尽管传统对抗攻击方法通常通过最小化对抗样本与原始图像之间的几何距离来确保视觉不可察觉性,但这种几何距离约束导致了攻击可迁移性受限、视觉质量下降以及人类不可感知的可解释性问题。本文提出了一种有监督语义变换生成模型,用于生成具有真实合法语义的对抗样本,其中首次构建了包含连续语义变化的无约束对抗流形,实现了从非对抗样本到对抗样本的合法过渡。在MNIST和工业缺陷数据集上的全面实验表明,我们的对抗样本不仅具有更优的视觉质量,还展现出更强的攻击可迁移性以及更有效的模型脆弱性解释能力,这表明其作为通用对抗样本的巨大潜力。代码和预训练模型可在https://github.com/shuaili1027/MAELS.git 获取。