Deep neural networks are valuable assets considering their commercial benefits and huge demands for costly annotation and computation resources. To protect the copyright of DNNs, backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model by embedding a specific backdoor behavior before releasing it. The defenders (usually the model owners) can identify whether a suspicious third-party model is ``stolen'' from them based on the presence of the behavior. Unfortunately, these watermarks are proven to be vulnerable to removal attacks even like fine-tuning. To further explore this vulnerability, we investigate the parameter space and find there exist many watermark-removed models in the vicinity of the watermarked one, which may be easily used by removal attacks. Inspired by this finding, we propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior. Extensive experiments demonstrate that our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks. The codes for reproducing our main experiments are available at \url{https://github.com/GuanhaoGan/robust-model-watermarking}.
翻译:深度神经网络因其商业效益以及对昂贵标注和计算资源的巨大需求而成为宝贵资产。为保护DNN的版权,基于后门的归属验证方法近年广受欢迎,模型所有者可在发布前通过嵌入特定后门行为为模型加水印。防御者(通常是模型所有者)可依据该行为是否出现,判断可疑第三方模型是否为其“窃取”。然而,这些水印被证明容易遭受诸如微调等移除攻击的破坏。为深入探究此脆弱性,我们研究了参数空间,发现水印模型邻域内存在许多水印已被移除的模型,这些模型易被移除攻击利用。受此启发,我们提出一种极小-极大公式化方法,以寻找这些水印移除模型并恢复其水印行为。大量实验表明,我们的方法能提升模型水印对参数变化及多种水印移除攻击的鲁棒性。复现我们主要实验的代码可在 \url{https://github.com/GuanhaoGan/robust-model-watermarking} 获取。