The rapid development of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has exposed vulnerabilities to various adversarial attacks. This paper provides a comprehensive overview of jailbreaking research targeting both LLMs and MLLMs, highlighting recent advancements in evaluation benchmarks, attack techniques and defense strategies. Compared to the more advanced state of unimodal jailbreaking, multimodal domain remains underexplored. We summarize the limitations and potential research directions of multimodal jailbreaking, aiming to inspire future research and further enhance the robustness and security of MLLMs.
翻译:大型语言模型(LLMs)与多模态大型语言模型(MLLMs)的快速发展暴露了其面对各类对抗攻击的脆弱性。本文全面综述了针对LLMs和MLLMs的越狱研究,重点阐述了评估基准、攻击技术和防御策略方面的最新进展。相较于已较为成熟的单模态越狱领域,多模态领域的研究仍显不足。我们总结了多模态越狱的局限性与潜在研究方向,旨在启发未来研究,并进一步提升MLLMs的鲁棒性与安全性。