Despite the broad application of Machine Learning models as a Service (MLaaS), they are vulnerable to model stealing attacks. These attacks can replicate the model functionality by using the black-box query process without any prior knowledge of the target victim model. Existing stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers. However, these defenses are now suffering problems of high inference computational overheads and unfavorable trade-offs between benign accuracy and stealing robustness, which challenges the feasibility of deployed models in practice. To address the problems, this paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses. Instead of deploying auxiliary defense modules that introduce redundant inference time, InI directly trains a defensive model by isolating the adversary's training gradient from the expected gradient, which can effectively reduce the inference computational cost. In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries, which can induce the adversary to extract little useful knowledge from victim models with minimal impact on the benign performance. Extensive experiments on several visual classification datasets (e.g., MNIST and CIFAR10) demonstrate the superior robustness (up to 48% reduction on stealing accuracy) and speed (up to 25.4x faster) of our InI over other state-of-the-art methods. Our codes can be found in https://github.com/DIG-Beihang/InI-Model-Stealing-Defense.
翻译:尽管机器学习模型即服务(MLaaS)得到广泛应用,但其仍面临模型窃取攻击的威胁。这类攻击无需了解目标受害模型的任何先验知识,仅通过黑盒查询过程即可复制模型功能。现有的窃取防御方法通过向受害模型的后验概率添加欺骗性扰动来误导攻击者,但这些方法存在推理计算开销高、良性精度与窃取鲁棒性之间难以权衡等问题,严重影响了已部署模型的实际可用性。为解决上述问题,本文提出一种新颖且有效的模型窃取防御训练框架——隔离与诱导(InI)。与传统部署辅助防御模块导致冗余推理时间不同,InI通过将攻击者训练梯度与期望梯度相隔离的方式直接训练防御模型,从而有效降低推理计算成本。与通过向模型预测添加扰动而损害良性精度的方案相反,我们通过训练模型对窃取查询产生无信息输出,诱导攻击者从受害模型中提取极少量有用知识,同时将对良性性能的影响降至最低。在多个视觉分类数据集(如MNIST和CIFAR10)上的大量实验表明,与其他最先进方法相比,我们的InI方法在鲁棒性(窃取准确率降低高达48%)和速度(推理加速高达25.4倍)方面均表现优越。相关代码详见https://github.com/DIG-Beihang/InI-Model-Stealing-Defense。