Recently, video-based large language models (video-based LLMs) have achieved impressive performance across various video comprehension tasks. However, this rapid advancement raises significant privacy and security concerns, particularly regarding the unauthorized use of personal video data in automated annotation by video-based LLMs. These unauthorized annotated video-text pairs can then be used to improve the performance of downstream tasks, such as text-to-video generation. To safeguard personal videos from unauthorized use, we propose two series of protective video watermarks with imperceptible adversarial perturbations, named Ramblings and Mutes. Concretely, Ramblings aim to mislead video-based LLMs into generating inaccurate captions for the videos, thereby degrading the quality of video annotations through inconsistencies between video content and captions. Mutes, on the other hand, are designed to prompt video-based LLMs to produce exceptionally brief captions, lacking descriptive detail. Extensive experiments demonstrate that our video watermarking methods effectively protect video data by significantly reducing video annotation performance across various video-based LLMs, showcasing both stealthiness and robustness in protecting personal video content. Our code is available at https://github.com/ttthhl/Protecting_Your_Video_Content.
翻译:近年来,基于视频的大语言模型(video-based LLMs)在各种视频理解任务中取得了令人瞩目的性能。然而,这一快速发展引发了严重的隐私和安全担忧,尤其是在基于视频的LLMs未经授权使用个人视频数据进行自动标注方面。这些未经授权的视频-文本标注对随后可用于提升下游任务(如文本到视频生成)的性能。为了保护个人视频免遭未经授权的使用,我们提出了两种具有不可察觉对抗扰动的保护性视频水印系列,分别命名为Ramblings和Mutes。具体而言,Ramblings旨在误导基于视频的LLMs为视频生成不准确的描述,从而通过视频内容与描述之间的不一致性降低视频标注的质量。另一方面,Mutes则被设计为促使基于视频的LLMs产生异常简短的描述,缺乏细节信息。大量实验表明,我们的视频水印方法通过显著降低多种基于视频的LLMs的视频标注性能,有效保护了视频数据,在保护个人视频内容方面展现了良好的隐蔽性和鲁棒性。我们的代码可在https://github.com/ttthhl/Protecting_Your_Video_Content获取。