Invisible watermarking is essential for tracing the provenance of digital content. However, training state-of-the-art models remains notoriously difficult, with current approaches often struggling to balance robustness against true imperceptibility. This work introduces Pixel Seal, which sets a new state-of-the-art for image and video watermarking. We first identify three fundamental issues of existing methods: (i) the reliance on proxy perceptual losses such as MSE and LPIPS that fail to mimic human perception and result in visible watermark artifacts; (ii) the optimization instability caused by conflicting objectives, which necessitates exhaustive hyperparameter tuning; and (iii) reduced robustness and imperceptibility of watermarks when scaling models to high-resolution images and videos. To overcome these issues, we first propose an adversarial-only training paradigm that eliminates unreliable pixel-wise imperceptibility losses. Second, we introduce a three-stage training schedule that stabilizes convergence by decoupling robustness and imperceptibility. Third, we address the resolution gap via high-resolution adaptation, employing JND-based attenuation and training-time inference simulation to eliminate upscaling artifacts. We thoroughly evaluate the robustness and imperceptibility of Pixel Seal on different image types and across a wide range of transformations, and show clear improvements over the state-of-the-art. We finally demonstrate that the model efficiently adapts to video via temporal watermark pooling, positioning Pixel Seal as a practical and scalable solution for reliable provenance in real-world image and video settings.
翻译:无感知水印技术对于追踪数字内容的来源至关重要。然而,当前训练先进模型的方法仍面临显著困难,现有方法往往难以在鲁棒性与真正的不可感知性之间取得平衡。本文提出Pixel Seal,为图像和视频水印技术树立了新的先进标准。我们首先指出了现有方法的三个根本问题:(i)依赖均方误差(MSE)和LPIPS等代理感知损失函数,这些函数无法准确模拟人类感知,导致可见的水印伪影;(ii)目标冲突引起的优化不稳定性,需要大量超参数调优;(iii)当模型扩展至高分辨率图像和视频时,水印的鲁棒性和不可感知性下降。为解决这些问题,我们首先提出一种仅基于对抗性训练的模式,消除了不可靠的逐像素不可感知性损失函数。其次,我们引入三阶段训练计划,通过解耦鲁棒性与不可感知性来稳定收敛过程。第三,我们通过高分辨率自适应技术解决分辨率差距问题,采用基于恰可察觉差(JND)的衰减和训练时推理模拟来消除上采样伪影。我们在不同类型图像及多种变换条件下全面评估了Pixel Seal的鲁棒性与不可感知性,结果显示其较现有先进方法有明显提升。最后,我们通过时序水印池化技术证明该模型能高效适应视频水印任务,使Pixel Seal成为现实场景中图像与视频可靠溯源的可扩展实用解决方案。