Model extraction attacks currently pose a non-negligible threat to the security and privacy of deep learning models. By querying the model with a small dataset and usingthe query results as the ground-truth labels, an adversary can steal a piracy model with performance comparable to the original model. Two key issues that cause the threat are, on the one hand, accurate and unlimited queries can be obtained by the adversary; on the other hand, the adversary can aggregate the query results to train the model step by step. The existing defenses usually employ model watermarking or fingerprinting to protect the ownership. However, these methods cannot proactively prevent the violation from happening. To mitigate the threat, we propose QUEEN (QUEry unlEarNing) that proactively launches counterattacks on potential model extraction attacks from the very beginning. To limit the potential threat, QUEEN has sensitivity measurement and outputs perturbation that prevents the adversary from training a piracy model with high performance. In sensitivity measurement, QUEEN measures the single query sensitivity by its distance from the center of its cluster in the feature space. To reduce the learning accuracy of attacks, for the highly sensitive query batch, QUEEN applies query unlearning, which is implemented by gradient reverse to perturb the softmax output such that the piracy model will generate reverse gradients to worsen its performance unconsciously. Experiments show that QUEEN outperforms the state-of-the-art defenses against various model extraction attacks with a relatively low cost to the model accuracy. The artifact is publicly available at https://anonymous.4open.science/r/queen implementation-5408/.
翻译:模型窃取攻击目前对深度学习模型的安全与隐私构成了不可忽视的威胁。攻击者通过使用一个小型数据集查询模型,并将查询结果作为真实标签,可以窃取一个性能与原始模型相当的盗版模型。造成此威胁的两个关键问题在于:一方面,攻击者能够获得准确且无限制的查询;另一方面,攻击者可以聚合查询结果以逐步训练模型。现有的防御方法通常采用模型水印或指纹技术来保护所有权。然而,这些方法无法主动防止侵权行为的发生。为缓解此威胁,我们提出了QUEEN(查询遗忘)方法,该方法从一开始就对潜在的模型窃取攻击发起主动反击。为了限制潜在威胁,QUEEN通过敏感性度量并输出扰动,以防止攻击者训练出高性能的盗版模型。在敏感性度量中,QUEEN通过计算单个查询在特征空间中与其所属聚类中心的距离来衡量其敏感性。为了降低攻击者的学习精度,对于高敏感性的查询批次,QUEEN应用查询遗忘技术,该技术通过梯度反转扰动softmax输出,使得盗版模型在无意识中产生反向梯度,从而降低其性能。实验表明,QUEEN在模型精度损失相对较低的情况下,优于当前最先进的防御方法,能够有效抵御多种模型窃取攻击。相关代码已公开于 https://anonymous.4open.science/r/queen implementation-5408/。