Model inversion attacks (MIAs) are aimed at recovering private data from a target model's training set, which poses a threat to the privacy of deep learning models. MIAs primarily focus on the white-box scenario where the attacker has full access to the structure and parameters of the target model. However, practical applications are black-box, it is not easy for adversaries to obtain model-related parameters, and various models only output predicted labels. Existing black-box MIAs primarily focused on designing the optimization strategy, and the generative model is only migrated from the GAN used in white-box MIA. Our research is the pioneering study of feasible attack models in label-only black-box scenarios, to the best of our knowledge. In this paper, we develop a novel method of MIA using the conditional diffusion model to recover the precise sample of the target without any extra optimization, as long as the target model outputs the label. Two primary techniques are introduced to execute the attack. Firstly, select an auxiliary dataset that is relevant to the target model task, and the labels predicted by the target model are used as conditions to guide the training process. Secondly, target labels and random standard normally distributed noise are input into the trained conditional diffusion model, generating target samples with pre-defined guidance strength. We then filter out the most robust and representative samples. Furthermore, we propose for the first time to use Learned Perceptual Image Patch Similarity (LPIPS) as one of the evaluation metrics for MIA, with systematic quantitative and qualitative evaluation in terms of attack accuracy, realism, and similarity. Experimental results show that this method can generate similar and accurate data to the target without optimization and outperforms generators of previous approaches in the label-only scenario.
翻译:模型反演攻击(MIAs)旨在从目标模型的训练集中恢复私有数据,这对深度学习模型的隐私构成了威胁。MIAs主要关注白盒场景,即攻击者完全了解目标模型的结构和参数。然而,实际应用多为黑盒场景,攻击者难以获取模型相关参数,且许多模型仅输出预测标签。现有的黑盒MIAs主要集中于设计优化策略,而生成模型仅从白盒MIA中使用的GAN迁移而来。据我们所知,我们的研究是首个在仅标签黑盒场景下探索可行攻击模型的先驱性工作。本文提出了一种利用条件扩散模型进行MIA的新方法,无需额外优化即可恢复目标的精确样本,只需目标模型输出标签。为实现攻击,我们引入了两种关键技术。首先,选取与目标模型任务相关的辅助数据集,并以目标模型预测的标签作为条件指导训练过程。其次,将目标标签和随机标准正态分布噪声输入训练好的条件扩散模型,通过预设的引导强度生成目标样本。随后,我们筛选出最鲁棒且最具代表性的样本。此外,我们首次提出使用学习感知图像块相似度(LPIPS)作为MIA的评估指标之一,并从攻击精度、真实性和相似性角度进行系统性的定量和定性评估。实验结果表明,该方法无需优化即可生成与目标相似且准确的数据,并在仅标签场景下优于先前方法中的生成器。