While Multimodal Large Language Models (MLLMs) show remarkable capabilities, their safety alignments are susceptible to jailbreak attacks. Existing attack methods typically focus on text-image interplay, treating the visual modality as a secondary prompt. This approach underutilizes the unique potential of images to carry complex, contextual information. To address this gap, we propose a new image-centric attack method, Contextual Image Attack (CIA), which employs a multi-agent system to subtly embeds harmful queries into seemingly benign visual contexts using four distinct visualization strategies. To further enhance the attack's efficacy, the system incorporate contextual element enhancement and automatic toxicity obfuscation techniques. Experimental results on the MMSafetyBench-tiny dataset show that CIA achieves high toxicity scores of 4.73 and 4.83 against the GPT-4o and Qwen2.5-VL-72B models, respectively, with Attack Success Rates (ASR) reaching 86.31\% and 91.07\%. Our method significantly outperforms prior work, demonstrating that the visual modality itself is a potent vector for jailbreaking advanced MLLMs.
翻译:尽管多模态大语言模型(MLLMs)展现出卓越的能力,但其安全对齐机制易受越狱攻击的影响。现有攻击方法通常聚焦于文本-图像的交互,将视觉模态视为次要提示。这种方式未能充分利用图像承载复杂上下文信息的独特潜力。为弥补这一不足,我们提出了一种新的以图像为中心的攻击方法——上下文图像攻击(CIA),该方法采用多智能体系统,通过四种不同的可视化策略,将有害查询巧妙地嵌入看似良性的视觉上下文中。为进一步提升攻击效能,该系统结合了上下文元素增强与自动毒性混淆技术。在MMSafetyBench-tiny数据集上的实验结果表明,CIA针对GPT-4o和Qwen2.5-VL-72B模型的毒性评分分别高达4.73和4.83,攻击成功率(ASR)达到86.31%和91.07%。我们的方法显著优于先前工作,证明视觉模态本身是越狱先进MLLMs的有效载体。