Ownership verification for neural networks is important for protecting these models from illegal copying, free-riding, re-distribution and other intellectual property misuse. We present a novel methodology for neural network ownership verification based on the notion of latent watermarks. Existing ownership verification methods either modify or introduce constraints to the neural network parameters, which are accessible to an attacker in a white-box attack and can be harmful to the network's normal operation, or train the network to respond to specific watermarks in the inputs similar to data poisoning-based backdoor attacks, which are susceptible to backdoor removal techniques. In this paper, we address these problems by decoupling a network's normal operation from its responses to watermarked inputs during ownership verification. The key idea is to train the network such that the watermarks remain dormant unless the owner's secret key is applied to activate it. The secret key is realized as a specific perturbation only known to the owner to the network's parameters. We show that our approach offers strong defense against backdoor detection, backdoor removal and surrogate model attacks.In addition, our method provides protection against ambiguity attacks where the attacker either tries to guess the secret weight key or uses fine-tuning to embed their own watermarks with a different key into a pre-trained neural network. Experimental results demonstrate the advantages and effectiveness of our proposed approach.
翻译:神经网络的所有权验证对于保护这些模型免受非法复制、搭便车、再分发及其他知识产权滥用至关重要。本文提出了一种基于潜在水印概念的新型神经网络所有权验证方法。现有所有权验证方法要么修改或引入对神经网络参数的约束,这些参数在白盒攻击中可被攻击者获取且可能影响网络正常运行,要么训练网络对输入中的特定水印作出响应(类似于基于数据投毒的后门攻击),这类方法易受后门移除技术攻击。针对这些问题,本文通过将网络的正常运行与其在所有权验证过程中对带水印输入的响应进行解耦。核心思想是训练网络使得水印保持休眠状态,除非应用所有者的密钥进行激活。该密钥被实现为仅所有者已知的网络参数特定扰动。研究表明,该方法能有效防御后门检测、后门移除及替代模型攻击。此外,本方法还能抵抗歧义攻击——攻击者试图猜测秘密权重密钥或通过微调将自身水印嵌入预训练神经网络的攻击。实验结果验证了所提方法的优势与有效性。