Deep Neural Networks (DNNs) have gained considerable traction in recent years due to the unparalleled results they gathered. However, the cost behind training such sophisticated models is resource intensive, resulting in many to consider DNNs to be intellectual property (IP) to model owners. In this era of cloud computing, high-performance DNNs are often deployed all over the internet so that people can access them publicly. As such, DNN watermarking schemes, especially backdoor-based watermarks, have been actively developed in recent years to preserve proprietary rights. Nonetheless, there lies much uncertainty on the robustness of existing backdoor watermark schemes, towards both adversarial attacks and unintended means such as fine-tuning neural network models. One reason for this is that no complete guarantee of robustness can be assured in the context of backdoor-based watermark. In this paper, we extensively evaluate the persistence of recent backdoor-based watermarks within neural networks in the scenario of fine-tuning, we propose/develop a novel data-driven idea to restore watermark after fine-tuning without exposing the trigger set. Our empirical results show that by solely introducing training data after fine-tuning, the watermark can be restored if model parameters do not shift dramatically during fine-tuning. Depending on the types of trigger samples used, trigger accuracy can be reinstated to up to 100%. Our study further explores how the restoration process works using loss landscape visualization, as well as the idea of introducing training data in fine-tuning stage to alleviate watermark vanishing.
翻译:深度神经网络(DNNs)近年来因其取得的无与伦比的成果而获得了广泛关注。然而,训练此类复杂模型的成本是资源密集型的,导致许多人将DNNs视为模型所有者的知识产权(IP)。在云计算时代,高性能DNNs通常被部署在互联网各处,以便公众公开访问。因此,近年来,为保护专有权利,DNN水印方案,特别是基于后门的水印,得到了积极发展。然而,现有后门水印方案对于对抗性攻击以及微调神经网络模型等非故意手段的鲁棒性仍存在很大不确定性。原因之一在于,在基于后门的水印背景下,无法完全保证其鲁棒性。本文在微调场景下,广泛评估了近期基于后门的水印在神经网络中的持久性,并提出/开发了一种新颖的数据驱动方法,以在不暴露触发集的情况下恢复微调后的水印。我们的实证结果表明,仅通过在微调后引入训练数据,若模型参数在微调过程中未发生剧烈偏移,水印即可恢复。根据所使用的触发样本类型,触发准确率最高可恢复至100%。本研究进一步通过损失景观可视化探讨了恢复过程的工作原理,并提出了在微调阶段引入训练数据以缓解水印消失的思路。