Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs

The increasing deployment of Large Vision-Language Models (LVLMs) raises safety concerns under potential malicious inputs. However, existing multimodal safety evaluations primarily focus on model vulnerabilities exposed by static image inputs, ignoring the temporal dynamics of video that may induce distinct safety risks. To bridge this gap, we introduce Video-SafetyBench, the first comprehensive benchmark designed to evaluate the safety of LVLMs under video-text attacks. It comprises 2,264 video-text pairs spanning 48 fine-grained unsafe categories, each pairing a synthesized video with either a harmful query, which contains explicit malice, or a benign query, which appears harmless but triggers harmful behavior when interpreted alongside the video. To generate semantically accurate videos for safety evaluation, we design a controllable pipeline that decomposes video semantics into subject images (what is shown) and motion text (how it moves), which jointly guide the synthesis of query-relevant videos. To effectively evaluate uncertain or borderline harmful outputs, we propose RJScore, a novel LLM-based metric that incorporates the confidence of judge models and human-aligned decision threshold calibration. Extensive experiments show that benign-query video composition achieves average attack success rates of 67.2%, revealing consistent vulnerabilities to video-induced attacks. We believe Video-SafetyBench will catalyze future research into video-based safety evaluation and defense strategies.

翻译：大视觉语言模型（LVLMs）的日益广泛部署引发了其在潜在恶意输入下的安全性担忧。然而，现有的多模态安全性评估主要关注静态图像输入所暴露的模型漏洞，忽视了视频的时序动态性可能引发的独特安全风险。为填补这一空白，我们提出了Video-SafetyBench，这是首个专门评估LVLMs在视频-文本攻击下安全性的综合基准。该基准包含2,264个视频-文本对，涵盖48个细粒度不安全类别，每个配对包含一个合成视频与一个有害查询（包含明确恶意内容）或一个良性查询（看似无害但在结合视频解读时会触发有害行为）。为生成语义准确的视频以进行安全性评估，我们设计了一个可控流程，将视频语义分解为主题图像（展示内容）和运动文本（运动方式），二者共同指导生成与查询相关的视频。为有效评估不确定或边界有害的输出，我们提出了RJScore，这是一种基于LLM的新型度量方法，它结合了评判模型的置信度与人类对齐的决策阈值校准。大量实验表明，良性查询的视频组合实现了平均67.2%的攻击成功率，揭示了LVLMs对视频诱导攻击存在一致的脆弱性。我们相信Video-SafetyBench将推动未来基于视频的安全性评估与防御策略的研究。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日