The rise of Machine Learning as a Service (MLaaS) has led to the widespread deployment of machine learning models trained on diverse datasets. These models are employed for predictive services through APIs, raising concerns about the security and confidentiality of the models due to emerging vulnerabilities in prediction APIs. Of particular concern are model cloning attacks, where individuals with limited data and no knowledge of the training dataset manage to replicate a victim model's functionality through black-box query access. This commonly entails generating adversarial queries to query the victim model, thereby creating a labeled dataset. This paper proposes "MisGUIDE", a two-step defense framework for Deep Learning models that disrupts the adversarial sample generation process by providing a probabilistic response when the query is deemed OOD. The first step employs a Vision Transformer-based framework to identify OOD queries, while the second step perturbs the response for such queries, introducing a probabilistic loss function to MisGUIDE the attackers. The aim of the proposed defense method is to reduce the accuracy of the cloned model while maintaining accuracy on authentic queries. Extensive experiments conducted on two benchmark datasets demonstrate that the proposed framework significantly enhances the resistance against state-of-the-art data-free model extraction in black-box settings.
翻译:机器学习即服务(MLaaS)的兴起促进了基于多样化数据集训练的机器学习模型的广泛部署。这些模型通过应用程序编程接口(API)提供预测服务,然而预测API中涌现的漏洞引发了对模型安全性与保密性的担忧。尤其值得关注的是模型克隆攻击——攻击者在数据有限且对训练数据集一无所知的情况下,通过黑盒查询访问成功复制受害模型的功能。此类攻击通常需要生成对抗性查询以查询受害模型,从而构建带标签数据集。本文提出"MisGUIDE"——一种针对深度学习模型的两阶段防御框架,该框架通过在被判定为分布外(OOD)的查询响应中引入概率性输出,破坏对抗性样本生成过程。第一阶段采用基于视觉Transformer的框架识别OOD查询,第二阶段扰动此类查询的响应,并引入概率损失函数对攻击者进行"误导"(MisGUIDE)。所提防御方法旨在在保持真实查询准确率的同时,降低克隆模型的准确率。在两个基准数据集上开展的大量实验表明,本框架显著增强了在黑盒设置下对抗最先进无数据模型提取攻击的鲁棒性。