Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything

Large Visual Language Models (VLMs) such as GPT-4 have achieved remarkable success in generating comprehensive and nuanced responses, surpassing the capabilities of large language models. However, with the integration of visual inputs, new security concerns emerge, as malicious attackers can exploit multiple modalities to achieve their objectives. This has led to increasing attention on the vulnerabilities of VLMs to jailbreak. Most existing research focuses on generating adversarial images or nonsensical image collections to compromise these models. However, the challenge of leveraging meaningful images to produce targeted textual content using the VLMs' logical comprehension of images remains unexplored. In this paper, we explore the problem of logical jailbreak from meaningful images to text. To investigate this issue, we introduce a novel dataset designed to evaluate flowchart image jailbreak. Furthermore, we develop a framework for text-to-text jailbreak using VLMs. Finally, we conduct an extensive evaluation of the framework on GPT-4o and GPT-4-vision-preview, with jailbreak rates of 92.8% and 70.0%, respectively. Our research reveals significant vulnerabilities in current VLMs concerning image-to-text jailbreak. These findings underscore the need for a deeper examination of the security flaws in VLMs before their practical deployment.

翻译：大型视觉语言模型（VLMs），如GPT-4，在生成全面且细致的回应方面取得了显著成功，其能力超越了大型语言模型。然而，随着视觉输入的整合，新的安全问题也随之出现，恶意攻击者可能利用多种模态来实现其目标。这导致人们越来越关注VLM对越狱攻击的脆弱性。现有研究大多集中于生成对抗性图像或无意义的图像集合来攻破这些模型。然而，如何利用有意义的图像，借助VLM对图像的逻辑理解能力来生成有针对性的文本内容，这一挑战尚未得到探索。在本文中，我们探讨了从有意义的图像到文本的逻辑越狱问题。为了研究此问题，我们引入了一个新颖的数据集，旨在评估流程图图像的越狱攻击。此外，我们开发了一个利用VLM进行文本到文本越狱的框架。最后，我们在GPT-4o和GPT-4-vision-preview上对该框架进行了广泛评估，其越狱成功率分别为92.8%和70.0%。我们的研究揭示了当前VLM在图像到文本越狱方面存在显著漏洞。这些发现强调了在VLM实际部署之前，需要对其安全缺陷进行更深入的研究。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日