Vision Language Models (VLMs) have shown remarkable capabilities in multimodal understanding, yet their susceptibility to adversarial perturbations poses a significant threat to their reliability in real-world applications. Despite often being imperceptible to humans, these perturbations can drastically alter model outputs, leading to erroneous interpretations and decisions. This paper introduces DiffCAP, a novel diffusion-based purification strategy that can effectively neutralize adversarial corruptions in VLMs. We theoretically establish a provable recovery region in the forward diffusion process and meanwhile quantify the convergence rate of semantic variation with respect to VLMs. These findings manifest that adversarial effects monotonically fade as diffusion unfolds. Guided by this principle, DiffCAP leverages noise injection with a similarity threshold of VLM embeddings as an adaptive criterion, before reverse diffusion restores a clean and reliable representation for VLM inference. Through extensive experiments across six datasets with three VLMs under varying attack strengths in three task scenarios, we show that DiffCAP outperforms existing defense techniques by a substantial margin. Notably, DiffCAP significantly reduces both hyperparameter tuning complexity and the required diffusion time, thereby accelerating the denoising process. Equipped with theorems and empirical support, DiffCAP provides a robust and practical solution for securely deploying VLMs in adversarial environments. The source code is available at https://github.com/JasonFu1998/DiffCAP.
翻译:视觉语言模型在多模态理解方面展现出卓越能力,但其对对抗扰动的敏感性严重威胁着实际应用中的可靠性。尽管这些扰动通常对人类而言难以察觉,却可能显著改变模型输出,导致错误解读与决策。本文提出DiffCAP——一种基于扩散的新型净化策略,能有效中和视觉语言模型中的对抗性污染。我们从理论上证明了前向扩散过程中存在可恢复区域,同时量化了视觉语言模型语义变异性的收敛速率。研究结果表明,随着扩散过程的推进,对抗效应呈现单调衰减趋势。基于这一原理,DiffCAP采用噪声注入机制,以视觉语言模型嵌入的相似性阈值作为自适应判据,再通过反向扩散重建用于视觉语言模型推理的干净可靠表征。我们在六个数据集、三种视觉语言模型、三类任务场景及不同攻击强度下开展的广泛实验表明,DiffCAP以显著优势超越现有防御技术。值得注意的是,DiffCAP大幅降低了超参数调优复杂度与所需扩散时间,从而加速了去噪过程。结合理论证明与实验支撑,DiffCAP为在对抗环境中安全部署视觉语言模型提供了鲁棒且实用的解决方案。源代码见https://github.com/JasonFu1998/DiffCAP。