Adversarial attacks in the physical world, particularly patch attacks, pose significant threats to the robustness and reliability of deep learning models. Developing reliable defenses against patch attacks is crucial for real-world applications, yet current research in this area is severely lacking. In this paper, we propose DIFFender, a novel defense method that leverages the pre-trained diffusion model to perform both localization and defense against potential adversarial patch attacks. DIFFender is designed as a pipeline consisting of two main stages: patch localization and restoration. In the localization stage, we exploit the intriguing properties of a diffusion model to effectively identify the locations of adversarial patches. In the restoration stage, we employ a text-guided diffusion model to eliminate adversarial regions in the image while preserving the integrity of the visual content. Additionally, we design a few-shot prompt-tuning algorithm to facilitate simple and efficient tuning, enabling the learned representations to easily transfer to downstream tasks, which optimize two stages jointly. We conduct extensive experiments on image classification and face recognition to demonstrate that DIFFender exhibits superior robustness under strong adaptive attacks and generalizes well across various scenarios, diverse classifiers, and multiple attack methods.
翻译:物理世界中的对抗攻击,尤其是补丁攻击,对深度学习模型的鲁棒性和可靠性构成了重大威胁。开发针对补丁攻击的可靠防御对于实际应用至关重要,然而当前该领域的研究严重不足。本文提出一种新型防御方法DIFFender,该方法利用预训练的扩散模型同时实现对抗补丁攻击的定位与防御。DIFFender被设计为包含两个主要阶段的流水线:补丁定位与修复。在定位阶段,我们利用扩散模型的有趣特性有效识别对抗补丁的位置。在修复阶段,我们采用文本引导的扩散模型消除图像中的对抗区域,同时保留视觉内容的完整性。此外,我们设计了一种少样本提示微调算法,以简化高效的调优过程,使学习到的表征能够轻松迁移至下游任务,从而联合优化两个阶段。我们在图像分类和人脸识别任务上进行了大量实验,结果表明DIFFender在强自适应攻击下展现出卓越的鲁棒性,并能在多种场景、不同分类器及多种攻击方法中实现良好泛化。