With the rapid advancement of multimodal learning, pre-trained Vision-Language Models (VLMs) such as CLIP have demonstrated remarkable capacities in bridging the gap between visual and language modalities. However, these models remain vulnerable to adversarial attacks, particularly in the image modality, presenting considerable security risks. This paper introduces Adversarial Prompt Tuning (AdvPT), a novel technique to enhance the adversarial robustness of image encoders in VLMs. AdvPT innovatively leverages learnable text prompts and aligns them with adversarial image embeddings, to address the vulnerabilities inherent in VLMs without the need for extensive parameter training or modification of the model architecture. We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing image-processing-based defense techniques, further boosting defensive capabilities. Comprehensive experimental analyses provide insights into adversarial prompt tuning, a novel paradigm devoted to improving resistance to adversarial images through textual input modifications, paving the way for future robust multimodal learning research. These findings open up new possibilities for enhancing the security of VLMs. Our code is available at https://github.com/jiamingzhang94/Adversarial-Prompt-Tuning.
翻译:随着多模态学习的快速发展,预训练的视觉-语言模型(如CLIP)在弥合视觉与语言模态之间的鸿沟方面展现出卓越能力。然而,这些模型在图像模态上仍易受对抗性攻击,存在显著的安全风险。本文提出对抗性提示调优(AdvPT),一种增强视觉-语言模型中图像编码器对抗鲁棒性的新技术。AdvPT创新性地利用可学习的文本提示,并将其与对抗性图像嵌入对齐,从而在不需大量参数训练或修改模型架构的情况下,解决视觉-语言模型固有的脆弱性。我们证明,AdvPT能提升对白盒与黑盒对抗性攻击的抵抗能力,并与现有基于图像处理的防御技术结合时产生协同效应,进一步增强防御性能。全面的实验分析揭示了对抗性提示调优这一新范式——通过文本输入修改来提升对对抗性图像的抵抗力,为未来鲁棒多模态学习研究开辟道路。这些发现为增强视觉-语言模型的安全性提供了新的可能性。代码已发布于https://github.com/jiamingzhang94/Adversarial-Prompt-Tuning。