On Function-Coupled Watermarks for Deep Neural Networks

Well-performed deep neural networks (DNNs) generally require massive labelled data and computational resources for training. Various watermarking techniques are proposed to protect such intellectual properties (IPs), wherein the DNN providers implant secret information into the model so that they can later claim IP ownership by retrieving their embedded watermarks with some dedicated trigger inputs. While promising results are reported in the literature, existing solutions suffer from watermark removal attacks, such as model fine-tuning and model pruning. In this paper, we propose a novel DNN watermarking solution that can effectively defend against the above attacks. Our key insight is to enhance the coupling of the watermark and model functionalities such that removing the watermark would inevitably degrade the model's performance on normal inputs. To this end, unlike previous methods relying on secret features learnt from out-of-distribution data, our method only uses features learnt from in-distribution data. Specifically, on the one hand, we propose to sample inputs from the original training dataset and fuse them as watermark triggers. On the other hand, we randomly mask model weights during training so that the information of our embedded watermarks spreads in the network. By doing so, model fine-tuning/pruning would not forget our function-coupled watermarks. Evaluation results on various image classification tasks show a 100\% watermark authentication success rate under aggressive watermark removal attacks, significantly outperforming existing solutions. Code is available: https://github.com/cure-lab/Function-Coupled-Watermark.

翻译：性能优异的深度神经网络通常需要大量标注数据和计算资源进行训练。为保护此类知识产权，研究者提出了多种水印技术，即通过向模型中植入秘密信息，使模型所有者能够利用特定触发输入提取嵌入水印以声明所有权。尽管现有文献已报道了令人鼓舞的成果，但现有方案仍易受模型微调和模型剪枝等水印移除攻击的影响。本文提出一种新型深度神经网络水印方案，可有效抵御上述攻击。核心思路在于增强水印与模型功能之间的耦合性，使得移除水印必然导致模型在常规输入上的性能下降。为此，与以往依赖离群分布数据学习秘密特征的方法不同，本方法仅利用分布内数据的特征。具体而言，一方面我们从原始训练数据集中采样输入并融合为水印触发器；另一方面，我们在训练过程中随机遮蔽模型权重，使嵌入水印的信息在网络中扩散传播。通过上述设计，模型微调/剪枝操作不会遗忘我们这种功能耦合型水印。在多种图像分类任务上的评估结果表明，本方法在强水印移除攻击下可实现100%的水印认证成功率，显著优于现有方案。代码开源地址：https://github.com/cure-lab/Function-Coupled-Watermark。