Recent years have seen a surge in interest in digital content watermarking techniques, driven by the proliferation of generative models and increased legal pressure. With an ever-growing percentage of AI-generated content available online, watermarking plays an increasingly important role in ensuring content authenticity and attribution at scale. There have been many works assessing the robustness of watermarking to removal attacks, yet, watermark forging, the scenario when a watermark is stolen from genuine content and applied to malicious content, remains underexplored. In this work, we investigate watermark forging in the context of widely used post-hoc image watermarking. Our contributions are as follows. First, we introduce a preference model to assess whether an image is watermarked. The model is trained using a ranking loss on purely procedurally generated images without any need for real watermarks. Second, we demonstrate the model's capability to remove and forge watermarks by optimizing the input image through backpropagation. This technique requires only a single watermarked image and works without knowledge of the watermarking model, making our attack much simpler and more practical than attacks introduced in related work. Third, we evaluate our proposed method on a variety of post-hoc image watermarking models, demonstrating that our approach can effectively forge watermarks, questioning the security of current watermarking approaches. Our code and further resources are publicly available.
翻译:近年来,数字内容水印技术的研究兴趣激增,这主要得益于生成模型的普及和法律压力的增加。随着在线AI生成内容的比例不断增长,水印在大规模确保内容真实性和归属认定方面发挥着日益重要的作用。已有许多工作评估了水印对去除攻击的鲁棒性,然而,水印伪造——即水印从真实内容中被窃取并应用于恶意内容的情形——仍未得到充分探索。在本工作中,我们针对广泛使用的后处理图像水印技术中的水印伪造问题展开研究。我们的贡献如下:首先,我们引入了一种偏好模型来评估图像是否包含水印。该模型仅使用程序生成的图像,通过排序损失进行训练,无需任何真实水印数据。其次,我们证明了该模型能够通过反向传播优化输入图像来去除和伪造水印。此技术仅需单张含水印图像,且无需了解水印模型的具体信息,使得我们的攻击比相关工作中提出的方法更为简单实用。第三,我们在多种后处理图像水印模型上评估了所提出的方法,结果表明我们的方法能有效伪造水印,这对当前水印方法的安全性提出了质疑。我们的代码及相关资源已公开提供。