In recent years, there has been significant advancement in the field of model watermarking techniques. However, the protection of image-processing neural networks remains a challenge, with only a limited number of methods being developed. The objective of these techniques is to embed a watermark in the output images of the target generative network, so that the watermark signal can be detected in the output of a surrogate model obtained through model extraction attacks. This promising technique, however, has certain limits. Analysis of the frequency domain reveals that the watermark signal is mainly concealed in the high-frequency components of the output. Thus, we propose an overwriting attack that involves forging another watermark in the output of the generative network. The experimental results demonstrate the efficacy of this attack in sabotaging existing watermarking schemes for image-processing networks, with an almost 100% success rate. To counter this attack, we devise an adversarial framework for the watermarking network. The framework incorporates a specially designed adversarial training step, where the watermarking network is trained to defend against the overwriting network, thereby enhancing its robustness. Additionally, we observe an overfitting phenomenon in the existing watermarking method, which can render it ineffective. To address this issue, we modify the training process to eliminate the overfitting problem.
翻译:近年来,模型水印技术取得了显著进展。然而,图像处理神经网络的保护仍面临挑战,目前仅开发出有限数量的方法。此类技术的目标是在目标生成网络的输出图像中嵌入水印,从而使得通过模型提取攻击获得的替代模型输出中仍能检测到水印信号。然而,这一颇具前景的技术存在某些局限性。频域分析表明,水印信号主要隐藏在输出图像的高频分量中。为此,我们提出一种覆写攻击,通过在生成网络输出中伪造另一个水印来实现。实验结果表明,该攻击能够有效破坏现有面向图像处理网络的水印方案,成功率近乎100%。为抵御此类攻击,我们设计了一种面向水印网络的对抗性框架。该框架包含一个特殊设计的对抗训练步骤,其中水印网络通过训练来防御覆写网络,从而增强其鲁棒性。此外,我们观察到现有水印方法存在过拟合现象,可能导致其失效。为解决这一问题,我们对训练过程进行了改进以消除过拟合。