To ensure the responsible distribution and use of open-source deep neural networks (DNNs), DNN watermarking has become a crucial technique to trace and verify unauthorized model replication or misuse. In practice, black-box watermarks manifest as specific predictive behaviors for specially crafted samples. However, due to the generalization nature of DNNs, the keys to extracting the watermark message are not unique, which would provide attackers with more opportunities. Advanced attack techniques can reverse-engineer approximate replacements for the original watermark keys, enabling subsequent watermark removal. In this paper, we explore black-box DNN watermarking specificity, which refers to the accuracy of a watermark's response to a key. Using this concept, we introduce Specificity-Enhanced Watermarking (SEW), a new method that improves specificity by reducing the association between the watermark and approximate keys. Through extensive evaluation using three popular watermarking benchmarks, we validate that enhancing specificity significantly contributes to strengthening robustness against removal attacks. SEW effectively defends against six state-of-the-art removal attacks, while maintaining model usability and watermark verification performance.
翻译:为确保开源深度神经网络(DNN)的负责任分发与使用,DNN水印技术已成为追踪和验证未经授权的模型复制或滥用的关键技术。在实际应用中,黑盒水印表现为针对特殊构造样本的特定预测行为。然而,由于DNN固有的泛化特性,提取水印信息的关键并非唯一,这为攻击者提供了更多可乘之机。先进的攻击技术能够逆向工程出原始水印密钥的近似替代品,从而实现后续的水印移除。本文深入探讨了黑盒DNN水印的特异性,即水印对密钥响应的精确程度。基于这一概念,我们提出了特异性增强水印(SEW)这一新方法,该方法通过削弱水印与近似密钥之间的关联来提升特异性。通过使用三种主流的水印基准进行广泛评估,我们验证了增强特异性对于提升抗移除攻击鲁棒性的显著贡献。SEW能有效防御六种最先进的移除攻击,同时保持模型可用性与水印验证性能。