Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks

Despite the broad application of Machine Learning models as a Service (MLaaS), they are vulnerable to model stealing attacks. These attacks can replicate the model functionality by using the black-box query process without any prior knowledge of the target victim model. Existing stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers. However, these defenses are now suffering problems of high inference computational overheads and unfavorable trade-offs between benign accuracy and stealing robustness, which challenges the feasibility of deployed models in practice. To address the problems, this paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses. Instead of deploying auxiliary defense modules that introduce redundant inference time, InI directly trains a defensive model by isolating the adversary's training gradient from the expected gradient, which can effectively reduce the inference computational cost. In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries, which can induce the adversary to extract little useful knowledge from victim models with minimal impact on the benign performance. Extensive experiments on several visual classification datasets (e.g., MNIST and CIFAR10) demonstrate the superior robustness (up to 48% reduction on stealing accuracy) and speed (up to 25.4x faster) of our InI over other state-of-the-art methods. Our codes can be found in https://github.com/DIG-Beihang/InI-Model-Stealing-Defense.

翻译：尽管机器学习即服务（MLaaS）已得到广泛应用，但模型仍易受到模型窃取攻击。此类攻击无需对目标受害者模型有任何先验知识，仅通过黑盒查询过程即可复制模型功能。现有窃取防御方法通过向受害者模型的后验概率添加欺骗性扰动来误导攻击者。然而，这些防御技术目前面临推理计算开销高、以及良性准确率与窃取鲁棒性之间不良权衡的问题，从而影响了部署模型在实践中的可行性。为解决上述问题，本文提出隔离与诱导（Isolation and Induction，InI）——一种新颖且有效的模型窃取防御训练框架。与部署引入冗余推理时间的辅助防御模块不同，InI通过将攻击者的训练梯度与预期梯度相隔离，直接训练防御性模型，从而有效降低推理计算成本。与通过对模型预测添加扰动而损害良性准确率的机制不同，我们训练模型对窃取查询生成非信息性输出，从而诱导攻击者从受害者模型中提取极少有用知识，同时将对良性任务性能的影响降至最低。在多个视觉分类数据集（如MNIST和CIFAR10）上进行的大量实验表明，我们的InI方法相较于其他现有最优方法具有更优越的鲁棒性（窃取准确率最高降低48%）和更快的速度（最高提速25.4倍）。我们的代码可在https://github.com/DIG-Beihang/InI-Model-Stealing-Defense获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日