Unrestricted adversarial attacks typically manipulate the semantic content of an image (e.g., color or texture) to create adversarial examples that are both effective and photorealistic, demonstrating their ability to deceive human perception and deep neural networks with stealth and success. However, current works usually sacrifice unrestricted degrees and subjectively select some image content to guarantee the photorealism of unrestricted adversarial examples, which limits its attack performance. To ensure the photorealism of adversarial examples and boost attack performance, we propose a novel unrestricted attack framework called Content-based Unrestricted Adversarial Attack. By leveraging a low-dimensional manifold that represents natural images, we map the images onto the manifold and optimize them along its adversarial direction. Therefore, within this framework, we implement Adversarial Content Attack based on Stable Diffusion and can generate high transferable unrestricted adversarial examples with various adversarial contents. Extensive experimentation and visualization demonstrate the efficacy of ACA, particularly in surpassing state-of-the-art attacks by an average of 13.3-50.4% and 16.8-48.0% in normally trained models and defense methods, respectively.
翻译:无约束对抗攻击通常通过操纵图像的语义内容(如颜色或纹理)生成既有效又逼真的对抗样本,展现其以隐蔽且成功的方式欺骗人类感知与深度神经网络的能力。然而,现有工作通常牺牲无约束程度,主观选择部分图像内容以保证无约束对抗样本的真实感,这限制了其攻击性能。为兼顾对抗样本的真实感并提升攻击性能,我们提出一种名为"基于内容的无约束对抗攻击"的新型无约束攻击框架。通过利用表征自然图像的低维流形,我们将图像映射至该流形并沿其对抗方向进行优化。基于此框架,我们实现了基于稳定扩散的对抗内容攻击,能够生成具有多种对抗内容的高迁移性无约束对抗样本。大量实验与可视化结果表明,ACA方法的有效性尤为突出:在常规训练模型与防御方法上,其攻击成功率平均分别超越当前最先进攻击方法13.3-50.4%和16.8-48.0%。