The rapid evolution of multimodal foundation models has led to significant advancements in cross-modal understanding and generation across diverse modalities, including text, images, audio, and video. However, these models remain susceptible to jailbreak attacks, which can bypass built-in safety mechanisms and induce the production of potentially harmful content. Consequently, understanding the methods of jailbreak attacks and existing defense mechanisms is essential to ensure the safe deployment of multimodal generative models in real-world scenarios, particularly in security-sensitive applications. To provide comprehensive insight into this topic, this survey reviews jailbreak and defense in multimodal generative models. First, given the generalized lifecycle of multimodal jailbreak, we systematically explore attacks and corresponding defense strategies across four levels: input, encoder, generator, and output. Based on this analysis, we present a detailed taxonomy of attack methods, defense mechanisms, and evaluation frameworks specific to multimodal generative models. Additionally, we cover a wide range of input-output configurations, including modalities such as Any-to-Text, Any-to-Vision, and Any-to-Any within generative systems. Finally, we highlight current research challenges and propose potential directions for future research.The open-source repository corresponding to this work can be found at https://github.com/liuxuannan/Awesome-Multimodal-Jailbreak.
翻译:多模态基础模型的快速发展显著推动了文本、图像、音频和视频等跨模态理解与生成能力的进步。然而,这些模型仍易受越狱攻击的影响,此类攻击可绕过内置安全机制,诱导模型生成潜在有害内容。因此,理解越狱攻击的方法及现有防御机制对于确保多模态生成模型在现实场景(尤其是安全敏感应用)中的安全部署至关重要。为全面解析该议题,本综述系统梳理了多模态生成模型中的越狱攻击与防御研究。首先,基于多模态越狱的通用生命周期,我们从输入层、编码层、生成层和输出层四个层面系统探讨了攻击方法及相应防御策略。基于此分析,我们提出了针对多模态生成模型的攻击方法、防御机制与评估框架的详细分类体系。此外,本文涵盖了广泛的输入-输出配置,包括生成系统中的任意模态到文本(Any-to-Text)、任意模态到视觉(Any-to-Vision)及任意模态到任意模态(Any-to-Any)等跨模态生成范式。最后,我们总结了当前研究面临的挑战,并展望了未来潜在的研究方向。本工作的开源资源库可见于 https://github.com/liuxuannan/Awesome-Multimodal-Jailbreak。