Self-adaptive Dataset Construction for Real-World Multimodal Safety Scenarios

Multimodal large language models (MLLMs) are rapidly evolving, presenting increasingly complex safety challenges. However, current dataset construction methods, which are risk-oriented, fail to cover the growing complexity of real-world multimodal safety scenarios (RMS). And due to the lack of a unified evaluation metric, their overall effectiveness remains unproven. This paper introduces a novel image-oriented self-adaptive dataset construction method for RMS, which starts with images and end constructing paired text and guidance responses. Using the image-oriented method, we automatically generate an RMS dataset comprising 35k image-text pairs with guidance responses. Additionally, we introduce a standardized safety dataset evaluation metric: fine-tuning a safety judge model and evaluating its capabilities on other safety datasets.Extensive experiments on various tasks demonstrate the effectiveness of the proposed image-oriented pipeline. The results confirm the scalability and effectiveness of the image-oriented approach, offering a new perspective for the construction of real-world multimodal safety datasets. The dataset is presented at https://huggingface.co/datasets/NewCityLetter/RMS2/tree/main.

翻译：多模态大语言模型（MLLMs）正在快速发展，带来了日益复杂的安全挑战。然而，当前以风险为导向的数据集构建方法，难以覆盖真实世界多模态安全场景（RMS）日益增长的复杂性。并且由于缺乏统一的评估指标，其整体有效性尚未得到验证。本文针对RMS提出了一种新颖的以图像为导向的自适应数据集构建方法，该方法从图像出发，最终构建配对的文本和指导性回复。利用这种以图像为导向的方法，我们自动生成了一个包含35k个带有指导性回复的图像-文本对的RMS数据集。此外，我们引入了一个标准化的安全数据集评估指标：微调一个安全评判模型，并在其他安全数据集上评估其能力。在各种任务上进行的大量实验证明了所提出的以图像为导向的流程的有效性。结果证实了该方法的可扩展性和有效性，为构建真实世界多模态安全数据集提供了新的视角。数据集发布于 https://huggingface.co/datasets/NewCityLetter/RMS2/tree/main。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

综述：面向移动端大语言模型的隐私与安全

专知会员服务

19+阅读 · 2025年9月7日

面向医学的多模态大型语言模型：全面综述

专知会员服务

25+阅读 · 2025年5月1日

当持续学习遇上多模态大型语言模型：综述

专知会员服务

32+阅读 · 2025年3月5日

多模态大规模语言模型基准的综述

专知会员服务

41+阅读 · 2024年8月25日