QUEEN: Query Unlearning against Model Extraction

Model extraction attacks currently pose a non-negligible threat to the security and privacy of deep learning models. By querying the model with a small dataset and usingthe query results as the ground-truth labels, an adversary can steal a piracy model with performance comparable to the original model. Two key issues that cause the threat are, on the one hand, accurate and unlimited queries can be obtained by the adversary; on the other hand, the adversary can aggregate the query results to train the model step by step. The existing defenses usually employ model watermarking or fingerprinting to protect the ownership. However, these methods cannot proactively prevent the violation from happening. To mitigate the threat, we propose QUEEN (QUEry unlEarNing) that proactively launches counterattacks on potential model extraction attacks from the very beginning. To limit the potential threat, QUEEN has sensitivity measurement and outputs perturbation that prevents the adversary from training a piracy model with high performance. In sensitivity measurement, QUEEN measures the single query sensitivity by its distance from the center of its cluster in the feature space. To reduce the learning accuracy of attacks, for the highly sensitive query batch, QUEEN applies query unlearning, which is implemented by gradient reverse to perturb the softmax output such that the piracy model will generate reverse gradients to worsen its performance unconsciously. Experiments show that QUEEN outperforms the state-of-the-art defenses against various model extraction attacks with a relatively low cost to the model accuracy. The artifact is publicly available at https://anonymous.4open.science/r/queen implementation-5408/.

翻译：模型窃取攻击目前对深度学习模型的安全与隐私构成了不可忽视的威胁。攻击者通过使用一个小型数据集查询模型，并将查询结果作为真实标签，可以窃取一个性能与原始模型相当的盗版模型。造成此威胁的两个关键问题在于：一方面，攻击者能够获得准确且无限制的查询；另一方面，攻击者可以聚合查询结果以逐步训练模型。现有的防御方法通常采用模型水印或指纹技术来保护所有权。然而，这些方法无法主动防止侵权行为的发生。为缓解此威胁，我们提出了QUEEN（查询遗忘）方法，该方法从一开始就对潜在的模型窃取攻击发起主动反击。为了限制潜在威胁，QUEEN通过敏感性度量并输出扰动，以防止攻击者训练出高性能的盗版模型。在敏感性度量中，QUEEN通过计算单个查询在特征空间中与其所属聚类中心的距离来衡量其敏感性。为了降低攻击者的学习精度，对于高敏感性的查询批次，QUEEN应用查询遗忘技术，该技术通过梯度反转扰动softmax输出，使得盗版模型在无意识中产生反向梯度，从而降低其性能。实验表明，QUEEN在模型精度损失相对较低的情况下，优于当前最先进的防御方法，能够有效抵御多种模型窃取攻击。相关代码已公开于 https://anonymous.4open.science/r/queen implementation-5408/。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日