Model stealing attacks have become a serious concern for deep learning models, where an attacker can steal a trained model by querying its black-box API. This can lead to intellectual property theft and other security and privacy risks. The current state-of-the-art defenses against model stealing attacks suggest adding perturbations to the prediction probabilities. However, they suffer from heavy computations and make impracticable assumptions about the adversary. They often require the training of auxiliary models. This can be time-consuming and resource-intensive which hinders the deployment of these defenses in real-world applications. In this paper, we propose a simple yet effective and efficient defense alternative. We introduce a heuristic approach to perturb the output probabilities. The proposed defense can be easily integrated into models without additional training. We show that our defense is effective in defending against three state-of-the-art stealing attacks. We evaluate our approach on large and quantized (i.e., compressed) Convolutional Neural Networks (CNNs) trained on several vision datasets. Our technique outperforms the state-of-the-art defenses with a $\times37$ faster inference latency without requiring any additional model and with a low impact on the model's performance. We validate that our defense is also effective for quantized CNNs targeting edge devices.
翻译:模型窃取攻击已成为深度学习模型面临的严重威胁,攻击者可通过查询黑盒API窃取训练好的模型,进而导致知识产权盗窃及其他安全与隐私风险。当前最先进的模型窃取防御方法建议在预测概率中添加扰动,但这些方法存在计算开销大、对攻击者的假设不切实际等问题,通常需要训练辅助模型。这种耗时且消耗资源的过程阻碍了防御措施在真实场景中的部署。本文提出一种简单高效且有效的替代防御方案,通过启发式方法扰动输出概率。该防御方法无需额外训练即可轻松集成至现有模型。实验表明,该方法能有效防御三种最先进的窃取攻击。我们在多个视觉数据集训练的大规模及量化(即压缩)卷积神经网络(CNN)上评估了本方案。与现有最先进防御方法相比,本技术在不需额外模型且对模型性能影响较小的前提下,推理延迟降低了37倍。此外,我们验证了该防御方法对面向边缘设备的量化CNN同样有效。