Video generative models achieve high-quality synthesis from natural-language prompts by leveraging large-scale web data. However, this training paradigm inherently exposes them to unsafe biases and harmful concepts, introducing the risk of generating undesirable or illicit content. To mitigate unsafe generations, existing machine unlearning approaches either rely on filtering, and can therefore be bypassed, or they update model weights, but with costly fine-tuning or training-free closed-form edits. We propose the first training-free weight update framework for concept removal in video diffusion models. From five paired safe/unsafe prompts, our method estimates a refusal vector and integrates it into the model weights as a closed-form update. A contrastive low-rank factorization further disentangles the target concept from unrelated semantics, it ensures a selective concept suppression and it does not harm generation quality. Our approach reduces unsafe generations on the Open-Sora and ZeroScopeT2V models across the T2VSafetyBench and SafeSora benchmarks, with average reductions of 36.3% and 58.2% respectively, while preserving prompt alignment and video quality. This establishes an efficient and scalable solution for safe video generation without retraining nor any inference overhead. Project page: https://www.pinlab.org/video-unlearning.
翻译:视频生成模型通过利用大规模网络数据,实现了从自然语言提示生成高质量内容的能力。然而,这种训练范式使其不可避免地接触到不安全的偏见和有害概念,从而带来生成不良或非法内容的风险。为减少不安全内容的生成,现有的机器遗忘方法要么依赖过滤机制(因此可能被绕过),要么通过更新模型权重来实现,但需要代价高昂的微调或免训练闭式编辑。本文提出了首个用于视频扩散模型中概念移除的免训练权重更新框架。该方法仅需五组安全/不安全提示对,即可估计出一个拒绝向量,并以闭式更新的形式将其整合到模型权重中。通过对比低秩分解进一步将目标概念与无关语义解耦,该方法实现了选择性概念抑制,且不会损害生成质量。在Open-Sora和ZeroScopeT2V模型上,我们的方法在T2VSafetyBench和SafeSora基准测试中显著减少了不安全内容的生成,平均降幅分别为36.3%和58.2%,同时保持了提示对齐性和视频质量。这为无需重新训练且无额外推理开销的安全视频生成提供了一种高效且可扩展的解决方案。项目页面:https://www.pinlab.org/video-unlearning。