Deep Learning (DL) models have become crucial in digital transformation, thus raising concerns about their intellectual property rights. Different watermarking techniques have been developed to protect Deep Neural Networks (DNNs) from IP infringement, creating a competitive field for DNN watermarking and removal methods. The predominant watermarking schemes use white-box techniques, which involve modifying weights by adding a unique signature to specific DNN layers. On the other hand, existing attacks on white-box watermarking usually require knowledge of the specific deployed watermarking scheme or access to the underlying data for further training and fine-tuning. We propose DeepEclipse, a novel and unified framework designed to remove white-box watermarks. We present obfuscation techniques that significantly differ from the existing white-box watermarking removal schemes. DeepEclipse can evade watermark detection without prior knowledge of the underlying watermarking scheme, additional data, or training and fine-tuning. Our evaluation reveals that DeepEclipse excels in breaking multiple white-box watermarking schemes, reducing watermark detection to random guessing while maintaining a similar model accuracy as the original one. Our framework showcases a promising solution to address the ongoing DNN watermark protection and removal challenges.
翻译:深度学习模型在数字化转型中日益关键,其知识产权保护问题引发广泛关注。针对深度神经网络的知识产权侵权,学界已发展出多种水印技术,形成了水印嵌入与去除方法的竞争领域。主流水印方案采用白盒技术,通过在特定网络层修改权重并嵌入唯一签名。而现有针对白盒水印的攻击通常需要预知具体水印方案或访问原始数据进行二次训练和微调。本文提出DeepEclipse,一种新颖统一的白盒水印去除框架。我们提出了与现有白盒水印去除方案显著不同的混淆技术。DeepEclipse无需预知底层水印方案、额外数据或训练微调即可规避水印检测。实验评估表明,DeepEclipse能有效破解多种白盒水印方案,在保持接近原始模型精度的同时,使水印检测准确率降至随机猜测水平。本框架为应对当前深度神经网络水印保护与去除挑战提供了可行方案。