Despite the broad application of Machine Learning models as a Service (MLaaS), they are vulnerable to model stealing attacks. These attacks can replicate the model functionality by using the black-box query process without any prior knowledge of the target victim model. Existing stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers. However, these defenses are now suffering problems of high inference computational overheads and unfavorable trade-offs between benign accuracy and stealing robustness, which challenges the feasibility of deployed models in practice. To address the problems, this paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses. Instead of deploying auxiliary defense modules that introduce redundant inference time, InI directly trains a defensive model by isolating the adversary's training gradient from the expected gradient, which can effectively reduce the inference computational cost. In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries, which can induce the adversary to extract little useful knowledge from victim models with minimal impact on the benign performance. Extensive experiments on several visual classification datasets (e.g., MNIST and CIFAR10) demonstrate the superior robustness (up to 48% reduction on stealing accuracy) and speed (up to 25.4x faster) of our InI over other state-of-the-art methods. Our codes can be found in https://github.com/DIG-Beihang/InI-Model-Stealing-Defense.
翻译:尽管机器学习即服务(MLaaS)已得到广泛应用,但模型仍易受到模型窃取攻击。此类攻击无需对目标受害者模型有任何先验知识,仅通过黑盒查询过程即可复制模型功能。现有窃取防御方法通过向受害者模型的后验概率添加欺骗性扰动来误导攻击者。然而,这些防御技术目前面临推理计算开销高、以及良性准确率与窃取鲁棒性之间不良权衡的问题,从而影响了部署模型在实践中的可行性。为解决上述问题,本文提出隔离与诱导(Isolation and Induction,InI)——一种新颖且有效的模型窃取防御训练框架。与部署引入冗余推理时间的辅助防御模块不同,InI通过将攻击者的训练梯度与预期梯度相隔离,直接训练防御性模型,从而有效降低推理计算成本。与通过对模型预测添加扰动而损害良性准确率的机制不同,我们训练模型对窃取查询生成非信息性输出,从而诱导攻击者从受害者模型中提取极少有用知识,同时将对良性任务性能的影响降至最低。在多个视觉分类数据集(如MNIST和CIFAR10)上进行的大量实验表明,我们的InI方法相较于其他现有最优方法具有更优越的鲁棒性(窃取准确率最高降低48%)和更快的速度(最高提速25.4倍)。我们的代码可在https://github.com/DIG-Beihang/InI-Model-Stealing-Defense获取。