Content watermarking is an important tool for the authentication and copyright protection of digital media. However, it is unclear whether existing watermarks are robust against adversarial attacks. We present the winning solution to the NeurIPS 2024 Erasing the Invisible challenge, which stress-tests watermark robustness under varying degrees of adversary knowledge. The challenge consisted of two tracks: a black-box and beige-box track, depending on whether the adversary knows which watermarking method was used by the provider. For the beige-box track, we leverage an adaptive VAE-based evasion attack, with a test-time optimization and color-contrast restoration in CIELAB space to preserve the image's quality. For the black-box track, we first cluster images based on their artifacts in the spatial or frequency-domain. Then, we apply image-to-image diffusion models with controlled noise injection and semantic priors from ChatGPT-generated captions to each cluster with optimized parameter settings. Empirical evaluations demonstrate that our method successfully achieves near-perfect watermark removal (95.7%) with negligible impact on the residual image's quality. We hope that our attacks inspire the development of more robust image watermarking methods.
翻译:内容水印是数字媒体认证与版权保护的重要工具。然而,现有水印技术是否能有效抵抗对抗性攻击尚不明确。本文介绍了我们在 NeurIPS 2024 "擦除隐形水印"挑战赛中的获胜方案,该挑战旨在压力测试水印在不同对抗者知识水平下的鲁棒性。挑战赛包含两条赛道:黑盒赛道与米色盒赛道,区别在于对抗者是否知晓服务提供商所采用的具体水印方法。针对米色盒赛道,我们采用了一种基于自适应变分自编码器的规避攻击,通过测试时优化与 CIELAB 色彩空间中的色彩对比度恢复来保持图像质量。对于黑盒赛道,我们首先根据图像在空间域或频域的伪影特征进行聚类,随后对每个聚类应用图像到图像的扩散模型,结合受控噪声注入与基于 ChatGPT 生成描述的语义先验,并采用优化后的参数设置。实证评估表明,我们的方法成功实现了近乎完美的水印去除(95.7%),同时对残留图像质量的影响可忽略不计。我们希望本研究所提出的攻击方法能够启发更鲁棒的图像水印技术的发展。