Image Hijacking: Adversarial Images can Control Generative Models at Runtime

Are foundation models secure from malicious actors? In this work, we focus on the image input to a vision-language model (VLM). We discover image hijacks, adversarial images that control generative models at runtime. We introduce Behavior Matching, a general method for creating image hijacks, and we use it to explore three types of attacks. Specific string attacks generate arbitrary output of the adversary's choosing. Leak context attacks leak information from the context window into the output. Jailbreak attacks circumvent a model's safety training. We study these attacks against LLaVA-2, a state-of-the-art VLM based on CLIP and LLaMA-2, and find that all our attack types have above a 90\% success rate. Moreover, our attacks are automated and require only small image perturbations. These findings raise serious concerns about the security of foundation models. If image hijacks are as difficult to defend against as adversarial examples in CIFAR-10, then it might be many years before a solution is found -- if it even exists.

翻译：基础模型是否能够抵御恶意行为者？在这项工作中，我们聚焦于视觉-语言模型（VLM）的图像输入。我们发现了图像劫持——一种能够在运行时控制生成模型的对抗性图像。我们引入了行为匹配（Behavior Matching），一种创建图像劫持的通用方法，并利用它探讨了三种攻击类型。特定字符串攻击可生成攻击者任意选择的输出；上下文泄露攻击能从上下文窗口中窃取信息到输出中；越狱攻击则绕过模型的安全训练。我们针对基于CLIP和LLaMA-2的先进VLM——LLaVA-2研究了这些攻击，发现所有攻击类型的成功率均超过90%。此外，我们的攻击是自动化的，且仅需微小的图像扰动。这些发现对基础模型的安全性提出了严重关切。如果图像劫持像CIFAR-10中的对抗样本一样难以防御，那么找到解决方案可能需要多年时间——甚至可能根本不存在。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日