MIRA: Cracking Black-box Watermarking on Deep Neural Networks via Model Inversion-based Removal Attacks

To protect the intellectual property of well-trained deep neural networks (DNNs), black-box DNN watermarks, which are embedded into the prediction behavior of DNN models on a set of specially-crafted samples, have gained increasing popularity in both academy and industry. Watermark robustness is usually implemented against attackers who steal the protected model and obfuscate its parameters for watermark removal. Recent studies empirically prove the robustness of most black-box watermarking schemes against known removal attempts. In this paper, we propose a novel Model Inversion-based Removal Attack (\textsc{Mira}), which is watermark-agnostic and effective against most of mainstream black-box DNN watermarking schemes. In general, our attack pipeline exploits the internals of the protected model to recover and unlearn the watermark message. We further design target class detection and recovered sample splitting algorithms to reduce the utility loss caused by \textsc{Mira} and achieve data-free watermark removal on half of the watermarking schemes. We conduct comprehensive evaluation of \textsc{Mira} against ten mainstream black-box watermarks on three benchmark datasets and DNN architectures. Compared with six baseline removal attacks, \textsc{Mira} achieves strong watermark removal effects on the covered watermarks, preserving at least $90\%$ of the stolen model utility, under more relaxed or even no assumptions on the dataset availability.

翻译：为保护训练良好的深度神经网络的知识产权，嵌入在模型对特定样本预测行为中的黑盒DNN水印在学术界和工业界日益普及。水印鲁棒性通常用于防御攻击者窃取受保护模型并通过混淆参数移除水印。最新研究通过实验证明了大多数黑盒水印方案对已知移除攻击的鲁棒性。本文提出一种新型的模型反演移除攻击（\textsc{Mira}），该攻击与具体水印无关，且对主流黑盒DNN水印方案具有广泛有效性。总体而言，我们的攻击流程通过挖掘受保护模型内部机制来恢复并遗忘水印信息。我们进一步设计了目标类别检测和恢复样本分割算法，以降低\textsc{Mira}带来的效用损失，并在半数水印方案上实现了无数据依赖的水印移除。我们在三个基准数据集和DNN架构上，针对十种主流黑盒水印方案对\textsc{Mira}进行了全面评估。与六种基线移除攻击相比，在数据集可用性条件更宽松甚至无任何假设的情况下，\textsc{Mira}对覆盖的所有水印方案均实现了强效移除效果，同时至少保留受窃模型$90\%$的效用。

相关内容

黑盒

关注 0

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日