Attracted by the impressive power of Multimodal Large Language Models (MLLMs), the public is increasingly utilizing them to improve the efficiency of daily work. Nonetheless, the vulnerabilities of MLLMs to unsafe instructions bring huge safety risks when these models are deployed in real-world scenarios. In this paper, we systematically survey current efforts on the evaluation, attack, and defense of MLLMs' safety on images and text. We begin with introducing the overview of MLLMs on images and text and understanding of safety, which helps researchers know the detailed scope of our survey. Then, we review the evaluation datasets and metrics for measuring the safety of MLLMs. Next, we comprehensively present attack and defense techniques related to MLLMs' safety. Finally, we analyze several unsolved issues and discuss promising research directions.
翻译:受多模态大语言模型(MLLMs)强大能力的吸引,公众正日益广泛地将其应用于提升日常工作效率。然而,当这些模型部署于实际场景时,其对不安全指令的脆弱性带来了巨大的安全隐患。本文系统综述了当前在图像与文本维度上针对MLLMs安全性的评估、攻击与防御研究。首先,我们介绍了图像与文本多模态大语言模型的概况及安全性内涵,以帮助研究者明确本综述涵盖的具体范畴。随后,我们梳理了用于度量MLLMs安全性的评估数据集与指标。接着,全面呈现了与MLLMs安全性相关的攻击与防御技术。最后,我们剖析了若干未解决的问题并探讨了具有前景的研究方向。