DeepTaster: Adversarial Perturbation-Based Fingerprinting to Identify Proprietary Dataset Use in Deep Neural Networks

Training deep neural networks (DNNs) requires large datasets and powerful computing resources, which has led some owners to restrict redistribution without permission. Watermarking techniques that embed confidential data into DNNs have been used to protect ownership, but these can degrade model performance and are vulnerable to watermark removal attacks. Recently, DeepJudge was introduced as an alternative approach to measuring the similarity between a suspect and a victim model. While DeepJudge shows promise in addressing the shortcomings of watermarking, it primarily addresses situations where the suspect model copies the victim's architecture. In this study, we introduce DeepTaster, a novel DNN fingerprinting technique, to address scenarios where a victim's data is unlawfully used to build a suspect model. DeepTaster can effectively identify such DNN model theft attacks, even when the suspect model's architecture deviates from the victim's. To accomplish this, DeepTaster generates adversarial images with perturbations, transforms them into the Fourier frequency domain, and uses these transformed images to identify the dataset used in a suspect model. The underlying premise is that adversarial images can capture the unique characteristics of DNNs built with a specific dataset. To demonstrate the effectiveness of DeepTaster, we evaluated the effectiveness of DeepTaster by assessing its detection accuracy on three datasets (CIFAR10, MNIST, and Tiny-ImageNet) across three model architectures (ResNet18, VGG16, and DenseNet161). We conducted experiments under various attack scenarios, including transfer learning, pruning, fine-tuning, and data augmentation. Specifically, in the Multi-Architecture Attack scenario, DeepTaster was able to identify all the stolen cases across all datasets, while DeepJudge failed to detect any of the cases.

翻译：训练深度神经网络需要大规模数据集和强大计算资源，这促使部分所有者禁止未经授权的数据再分发。传统水印技术通过将机密数据嵌入深度神经网络来保护所有权，但该方法可能降低模型性能，且易受水印移除攻击。最新提出的DeepJudge方法通过衡量嫌疑模型与受害者模型间的相似性，虽有望弥补水印缺陷，但其主要针对嫌疑模型复制受害者架构的场景。本研究提出新型深度神经网络指纹识别技术DeepTaster，用于识别受害者数据被非法用于构建嫌疑模型的情况。即便嫌疑模型架构与受害者存在差异，DeepTaster仍能有效检测此类模型盗窃攻击。具体而言，DeepTaster生成带扰动的对抗图像，将其转换至傅里叶频域后，利用频域变换图像识别嫌疑模型所用的数据集。其核心原理在于对抗图像能够捕获基于特定数据集构建的深度神经网络的独有特征。为验证有效性，我们在三种数据集（CIFAR10、MNIST和Tiny-ImageNet）及三种模型架构（ResNet18、VGG16和DenseNet161）上评估了DeepTaster的检测准确率，并在迁移学习、剪枝、微调及数据增强等多类攻击场景下开展实验。在多架构攻击场景中，DeepTaster成功识别出所有数据集上的盗窃案例，而DeepJudge未能检测出任何案例。