As Multimodal Large Language Models (MLLMs) become an indispensable assistant in human life, the unsafe content generated by MLLMs poses a danger to human behavior, perpetually overhanging human society like a sword of Damocles. To investigate and evaluate the safety impact of MLLMs responses on human behavior in daily life, we introduce SaLAD, a multimodal safety benchmark which contains 2,013 real-world image-text samples across 10 common categories, with a balanced design covering both unsafe scenarios and cases of oversensitivity. It emphasizes realistic risk exposure, authentic visual inputs, and fine-grained cross-modal reasoning, ensuring that safety risks cannot be inferred from text alone. We further propose a safety-warning-based evaluation framework that encourages models to provide clear and informative safety warnings, rather than generic refusals. Results on 18 MLLMs demonstrate that the top-performing models achieve a safe response rate of only 57.2% on unsafe queries. Moreover, even popular safety alignment methods limit effectiveness of the models in our scenario, revealing the vulnerabilities of current MLLMs in identifying dangerous behaviors in daily life. Our dataset is available at https://github.com/xinyuelou/SaLAD.
翻译:随着多模态大语言模型(MLLMs)日益成为人类生活中不可或缺的助手,其生成的不安全内容对人类行为构成威胁,如同达摩克利斯之剑般持续悬于人类社会之上。为调查和评估MLLMs响应在日常生活场景中对人类行为的安全影响,我们提出了SaLAD——一个包含2,013个真实世界图文样本的多模态安全基准,涵盖10个常见类别,并通过平衡设计同时包含不安全场景与过度敏感案例。该基准强调真实风险暴露、原生态视觉输入及细粒度跨模态推理,确保安全风险无法仅通过文本推断。我们进一步提出基于安全警告的评估框架,鼓励模型提供清晰且信息丰富的安全警告,而非笼统拒绝。在18个MLLMs上的测试结果表明,表现最佳的模型在不安全查询中仅达到57.2%的安全响应率。此外,即使采用主流的安全对齐方法,模型在本研究场景中的有效性仍受限,揭示了当前MLLMs在识别日常生活中危险行为方面存在的脆弱性。本数据集发布于https://github.com/xinyuelou/SaLAD。