Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks

Despite the broad application of Machine Learning models as a Service (MLaaS), they are vulnerable to model stealing attacks. These attacks can replicate the model functionality by using the black-box query process without any prior knowledge of the target victim model. Existing stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers. However, these defenses are now suffering problems of high inference computational overheads and unfavorable trade-offs between benign accuracy and stealing robustness, which challenges the feasibility of deployed models in practice. To address the problems, this paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses. Instead of deploying auxiliary defense modules that introduce redundant inference time, InI directly trains a defensive model by isolating the adversary's training gradient from the expected gradient, which can effectively reduce the inference computational cost. In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries, which can induce the adversary to extract little useful knowledge from victim models with minimal impact on the benign performance. Extensive experiments on several visual classification datasets (e.g., MNIST and CIFAR10) demonstrate the superior robustness (up to 48% reduction on stealing accuracy) and speed (up to 25.4x faster) of our InI over other state-of-the-art methods. Our codes can be found in https://github.com/DIG-Beihang/InI-Model-Stealing-Defense.

翻译：尽管机器学习模型即服务（MLaaS）得到广泛应用，但其仍面临模型窃取攻击的威胁。这类攻击无需了解目标受害模型的任何先验知识，仅通过黑盒查询过程即可复制模型功能。现有的窃取防御方法通过向受害模型的后验概率添加欺骗性扰动来误导攻击者，但这些方法存在推理计算开销高、良性精度与窃取鲁棒性之间难以权衡等问题，严重影响了已部署模型的实际可用性。为解决上述问题，本文提出一种新颖且有效的模型窃取防御训练框架——隔离与诱导（InI）。与传统部署辅助防御模块导致冗余推理时间不同，InI通过将攻击者训练梯度与期望梯度相隔离的方式直接训练防御模型，从而有效降低推理计算成本。与通过向模型预测添加扰动而损害良性精度的方案相反，我们通过训练模型对窃取查询产生无信息输出，诱导攻击者从受害模型中提取极少量有用知识，同时将对良性性能的影响降至最低。在多个视觉分类数据集（如MNIST和CIFAR10）上的大量实验表明，与其他最先进方法相比，我们的InI方法在鲁棒性（窃取准确率降低高达48%）和速度（推理加速高达25.4倍）方面均表现优越。相关代码详见https://github.com/DIG-Beihang/InI-Model-Stealing-Defense。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日