Copyright protection for deep neural networks (DNNs) is an urgent need for AI corporations. To trace illegally distributed model copies, DNN watermarking is an emerging technique for embedding and verifying secret identity messages in the prediction behaviors or the model internals. Sacrificing less functionality and involving more knowledge about the target DNN, the latter branch called \textit{white-box DNN watermarking} is believed to be accurate, credible and secure against most known watermark removal attacks, with emerging research efforts in both the academy and the industry. In this paper, we present the first systematic study on how the mainstream white-box DNN watermarks are commonly vulnerable to neural structural obfuscation with \textit{dummy neurons}, a group of neurons which can be added to a target model but leave the model behavior invariant. Devising a comprehensive framework to automatically generate and inject dummy neurons with high stealthiness, our novel attack intensively modifies the architecture of the target model to inhibit the success of watermark verification. With extensive evaluation, our work for the first time shows that nine published watermarking schemes require amendments to their verification procedures.
翻译:深度神经网络(DNN)的版权保护是人工智能企业的迫切需求。为追踪非法分发的模型副本,DNN水印技术通过在预测行为或模型内部嵌入并验证秘密身份信息,成为新兴技术方案。牺牲较小功能并涉及更多目标DNN知识的后一分支——即白盒DNN水印——被认为具有高精度、高可信度且能抵御大多数已知水印移除攻击,学术界与工业界正在涌现相关研究。本文首次系统性地揭示了:主流白盒DNN水印普遍易受基于“虚拟神经元”的神经结构混淆攻击——这种神经元可被添加至目标模型但保持模型行为不变。我们设计了一套综合性攻击框架,能自动生成并注入高度隐蔽的虚拟神经元,通过大幅修改目标模型的架构来抑制水印验证的成功率。经广泛评估,本文首次证明了九种已发表的水印方案均需修正其验证流程。