Large language models have become increasingly prominent, also signaling a shift towards multimodality as the next frontier in artificial intelligence, where their embeddings are harnessed as prompts to generate textual content. Vision-language models (VLMs) stand at the forefront of this advancement, offering innovative ways to combine visual and textual data for enhanced understanding and interaction. However, this integration also enlarges the attack surface. Patch-based adversarial attack is considered the most realistic threat model in physical vision applications, as demonstrated in many existing literature. In this paper, we propose to address patched visual prompt injection, where adversaries exploit adversarial patches to generate target content in VLMs. Our investigation reveals that patched adversarial prompts exhibit sensitivity to pixel-wise randomization, a trait that remains robust even against adaptive attacks designed to counteract such defenses. Leveraging this insight, we introduce SmoothVLM, a defense mechanism rooted in smoothing techniques, specifically tailored to protect VLMs from the threat of patched visual prompt injectors. Our framework significantly lowers the attack success rate to a range between 0% and 5.0% on two leading VLMs, while achieving around 67.3% to 95.0% context recovery of the benign images, demonstrating a balance between security and usability.
翻译:大型语言模型日益突出,也标志着向多模态作为人工智能下一前沿的转变,其嵌入被用作提示以生成文本内容。视觉-语言模型(VLM)处于这一进步的前沿,提供了创新方式将视觉与文本数据结合,以增强理解与交互。然而,这种整合也扩大了攻击面。基于补丁的对抗性攻击被认为是物理视觉应用中最现实的威胁模型,这在许多现有文献中已有证明。在本文中,我们提出应对补丁式视觉提示注入,攻击者利用对抗性补丁在VLM中生成目标内容。我们的研究发现,补丁式对抗提示对逐像素随机化表现出敏感性,这一特性即使在为对抗此类防御而设计的自适应攻击面前也保持鲁棒。利用这一洞见,我们引入SmoothVLM,一种基于平滑技术的防御机制,专门设计用于保护VLM免受补丁式视觉提示注入器的威胁。我们的框架显著将两种领先VLM上的攻击成功率降低至0%到5.0%之间,同时实现约67.3%到95.0%的良性图像上下文恢复,展示了安全性与可用性之间的平衡。