DNN-based watermarking methods have rapidly advanced, with the ``Encoder-Noise Layer-Decoder'' (END) framework being the most widely used. To ensure end-to-end training, the noise layer in the framework must be differentiable. However, real-world distortions are often non-differentiable, leading to challenges in end-to-end training. Existing solutions only treat the distortion perturbation as additive noise, which does not fully integrate the effect of distortion in training. To better incorporate non-differentiable distortions into training, we propose a novel dual-decoder architecture (END$^2$). Unlike conventional END architecture, our method employs two structurally identical decoders: the Teacher Decoder, processing pure watermarked images, and the Student Decoder, handling distortion-perturbed images. The gradient is backpropagated only through the Teacher Decoder branch to optimize the encoder thus bypassing the problem of non-differentiability. To ensure resistance to arbitrary distortions, we enforce alignment of the two decoders' feature representations by maximizing the cosine similarity between their intermediate vectors on a hypersphere. Extensive experiments demonstrate that our scheme outperforms state-of-the-art algorithms under various non-differentiable distortions. Moreover, even without the differentiability constraint, our method surpasses baselines with a differentiable noise layer. Our approach is effective and easily implementable across all END architectures, enhancing practicality and generalizability.
翻译:基于DNN的水印方法发展迅速,其中“编码器-噪声层-解码器”(END)框架应用最为广泛。为确保端到端训练,该框架中的噪声层必须是可微的。然而,现实中的失真往往不可微,这给端到端训练带来了挑战。现有解决方案仅将失真扰动视为加性噪声,未能将失真的影响充分融入训练过程。为了更好地将不可微失真纳入训练,我们提出了一种新颖的双解码器架构(END$^2$)。与传统的END架构不同,我们的方法采用两个结构完全相同的解码器:教师解码器处理纯净水印图像,学生解码器处理失真扰动图像。梯度仅通过教师解码器分支反向传播以优化编码器,从而规避了不可微问题。为确保对任意失真的鲁棒性,我们通过在超球面上最大化两个解码器中间向量的余弦相似度,强制对齐它们的特征表示。大量实验表明,在各种不可微失真条件下,我们的方案优于当前最先进的算法。此外,即使在没有可微性约束的情况下,我们的方法也超越了采用可微噪声层的基线模型。该方法高效且易于在所有END架构中实现,显著提升了实用性与泛化能力。