Integrating watermarking into the generation process of latent diffusion models (LDMs) simplifies detection and attribution of generated content. Semantic watermarks, such as Tree-Rings and Gaussian Shading, represent a novel class of watermarking techniques that are easy to implement and highly robust against various perturbations. However, our work demonstrates a fundamental security vulnerability of semantic watermarks. We show that attackers can leverage unrelated models, even with different latent spaces and architectures (UNet vs DiT), to perform powerful and realistic forgery attacks. Specifically, we design two watermark forgery attacks. The first imprints a targeted watermark into real images by manipulating the latent representation of an arbitrary image in an unrelated LDM to get closer to the latent representation of a watermarked image. We also show that this technique can be used for watermark removal. The second attack generates new images with the target watermark by inverting a watermarked image and re-generating it with an arbitrary prompt. Both attacks just need a single reference image with the target watermark. Overall, our findings question the applicability of semantic watermarks by revealing that attackers can easily forge or remove these watermarks under realistic conditions.
翻译:将水印技术集成到潜在扩散模型(LDMs)的生成过程中,简化了对生成内容的检测与溯源。诸如Tree-Rings和Gaussian Shading等语义水印代表了一类新颖的水印技术,它们易于实现,并对各种扰动具有高度鲁棒性。然而,我们的工作揭示了语义水印存在一个根本性的安全漏洞。我们证明,攻击者可以利用不相关的模型(即使这些模型具有不同的潜在空间和架构,例如UNet与DiT)来实施强大且逼真的伪造攻击。具体而言,我们设计了两种水印伪造攻击。第一种攻击通过操纵一个不相关LDM中任意图像的潜在表示,使其更接近一个含水印图像的潜在表示,从而将目标水印嵌入到真实图像中。我们还证明该技术也可用于水印去除。第二种攻击通过对含水印图像进行反演,并使用任意提示词重新生成,从而生成带有目标水印的新图像。两种攻击都仅需要一张带有目标水印的参考图像。总体而言,我们的研究通过揭示攻击者能够在现实条件下轻易伪造或去除这些水印,对语义水印的适用性提出了质疑。