Video Seal: Open and Efficient Video Watermarking

The proliferation of AI-generated content and sophisticated video editing tools has made it both important and challenging to moderate digital platforms. Video watermarking addresses these challenges by embedding imperceptible signals into videos, allowing for identification. However, the rare open tools and methods often fall short on efficiency, robustness, and flexibility. To reduce these gaps, this paper introduces Video Seal, a comprehensive framework for neural video watermarking and a competitive open-sourced model. Our approach jointly trains an embedder and an extractor, while ensuring the watermark robustness by applying transformations in-between, e.g., video codecs. This training is multistage and includes image pre-training, hybrid post-training and extractor fine-tuning. We also introduce temporal watermark propagation, a technique to convert any image watermarking model to an efficient video watermarking model without the need to watermark every high-resolution frame. We present experimental results demonstrating the effectiveness of the approach in terms of speed, imperceptibility, and robustness. Video Seal achieves higher robustness compared to strong baselines especially under challenging distortions combining geometric transformations and video compression. Additionally, we provide new insights such as the impact of video compression during training, and how to compare methods operating on different payloads. Contributions in this work - including the codebase, models, and a public demo - are open-sourced under permissive licenses to foster further research and development in the field.

翻译：随着AI生成内容和复杂视频编辑工具的激增，数字平台的内容审核变得既重要又具有挑战性。视频水印技术通过将难以察觉的信号嵌入视频以实现身份识别，从而应对这些挑战。然而，现有的公开工具和方法在效率、鲁棒性和灵活性方面往往存在不足。为弥补这些差距，本文提出了Video Seal——一个全面的神经视频水印框架及具有竞争力的开源模型。我们的方法联合训练嵌入器和提取器，并通过在两者之间施加变换（例如视频编解码器）来确保水印的鲁棒性。该训练采用多阶段策略，包括图像预训练、混合后训练和提取器微调。我们还提出了时序水印传播技术，该技术可将任何图像水印模型转换为高效的视频水印模型，而无需对每个高分辨率帧单独嵌入水印。实验结果表明，该方法在速度、隐蔽性和鲁棒性方面均表现出色。与现有强基线方法相比，Video Seal展现出更高的鲁棒性，尤其是在结合几何变换与视频压缩的复杂失真场景下。此外，我们提供了新的见解，例如训练期间视频压缩的影响，以及如何比较处理不同有效载荷的方法。本工作的贡献——包括代码库、模型和公开演示——均在宽松许可下开源，以推动该领域的进一步研究与发展。