Attracted by the impressive power of Multimodal Large Language Models (MLLMs), the public is increasingly utilizing them to improve the efficiency of daily work. Nonetheless, the vulnerabilities of MLLMs to unsafe instructions bring huge safety risks when these models are deployed in real-world scenarios. In this paper, we systematically survey current efforts on the evaluation, attack, and defense of MLLMs' safety on images and text. We begin with introducing the overview of MLLMs on images and text and understanding of safety, which helps researchers know the detailed scope of our survey. Then, we review the evaluation datasets and metrics for measuring the safety of MLLMs. Next, we comprehensively present attack and defense techniques related to MLLMs' safety. Finally, we analyze several unsolved issues and discuss promising research directions. The latest papers are continually collected at https://github.com/isXinLiu/MLLM-Safety-Collection.
翻译:多模态大语言模型(MLLMs)所展现出的强大能力吸引了公众的广泛关注,人们越来越多地利用它们来提高日常工作效率。然而,MLLMs在面对不安全指令时表现出的脆弱性,使其在实际应用场景中部署时带来巨大的安全风险。本文系统性地综述了当前在图像与文本领域对MLLMs安全性进行评估、攻击与防御的相关研究。首先,我们介绍了图像与文本MLLMs的概况及对安全性的理解,以帮助研究者明确本综述的具体范围。随后,我们回顾了用于评估MLLMs安全性的数据集与度量指标。接下来,我们全面阐述了与MLLMs安全性相关的攻击与防御技术。最后,我们分析了若干尚未解决的问题,并探讨了有前景的研究方向。最新相关论文将持续收录于 https://github.com/isXinLiu/MLLM-Safety-Collection。