Recent text-to-video (T2V) models can synthesize complex videos from lightweight natural language prompts, raising urgent concerns about safety alignment in the event of misuse in the real world. Prior jailbreak attacks typically rewrite unsafe prompts into paraphrases that evade content filters while preserving meaning. Yet, these approaches often still retain explicit sensitive cues in the input text and therefore overlook a more profound, video-specific weakness. In this paper, we identify a temporal trajectory infilling vulnerability of T2V systems under fragmented prompts: when the prompt specifies only sparse boundary conditions (e.g., start and end frames) and leaves the intermediate evolution underspecified, the model may autonomously reconstruct a plausible trajectory that includes harmful intermediate frames, despite the prompt appearing benign to input or output side filtering. Building on this observation, we propose TFM. This fragmented prompting framework converts an originally unsafe request into a temporally sparse two-frame extraction and further reduces overtly sensitive cues via implicit substitution. Extensive evaluations across multiple open-source and commercial T2V models demonstrate that TFM consistently enhances jailbreak effectiveness, achieving up to a 12% increase in attack success rate on commercial systems. Our findings highlight the need for temporally aware safety mechanisms that account for model-driven completion beyond prompt surface form.
翻译:近期文本到视频(T2V)模型能够根据轻量级自然语言提示合成复杂视频,这引发了对其在现实世界中被滥用时安全对齐的迫切担忧。现有越狱攻击通常将不安全提示重写为能够规避内容过滤器同时保留原意的改写形式。然而,这些方法往往仍在输入文本中保留显式的敏感线索,因而忽视了视频模型特有的更深层弱点。本文中,我们识别了T2V系统在碎片化提示下存在的时序轨迹填充漏洞:当提示仅指定稀疏的边界条件(例如起始帧和结束帧)而中间演化过程未被明确定义时,模型可能自主重建包含有害中间帧的合理轨迹,尽管该提示在输入或输出端过滤看来是良性的。基于此观察,我们提出了TFM。该碎片化提示框架将原始不安全请求转化为时序稀疏的双帧提取,并通过隐式替换进一步减少显性敏感线索。在多个开源及商业T2V模型上的广泛评估表明,TFM持续提升越狱有效性,在商业系统上实现攻击成功率最高提升12%。我们的研究结果强调,需要开发具备时序感知能力的安全机制,以应对超越提示表层形式的模型驱动式内容补全。