A significant number of machine learning models are vulnerable to model extraction attacks, which focus on stealing the models by using specially curated queries against the target model. This task is well accomplished by using part of the training data or a surrogate dataset to train a new model that mimics a target model in a white-box environment. In pragmatic situations, however, the target models are trained on private datasets that are inaccessible to the adversary. The data-free model extraction technique replaces this problem when it comes to using queries artificially curated by a generator similar to that used in Generative Adversarial Nets. We propose for the first time, to the best of our knowledge, an adversary black box attack extending to a regression problem for predicting bounding box coordinates in object detection. As part of our study, we found that defining a loss function and using a novel generator setup is one of the key aspects in extracting the target model. We find that the proposed model extraction method achieves significant results by using reasonable queries. The discovery of this object detection vulnerability will support future prospects for securing such models.
翻译:大量机器学习模型易受模型窃取攻击,此类攻击通过精心设计的查询针对目标模型实施窃取行为。现有研究多利用部分训练数据或替代数据集,在白盒环境下训练新模型以模仿目标模型。然而在现实场景中,目标模型通常基于攻击者无法获取的私有数据集进行训练。无数据模型窃取技术通过采用类似生成对抗网络(GAN)中生成器的人工合成查询方法解决了这一难题。据我们所知,本文首次提出一种面向回归问题(目标检测中边界框坐标预测)的对抗性黑盒攻击方法。研究发现,损失函数定义与新型生成器架构设计是提取目标模型的关键要素。实验表明,所提出的模型窃取方法在合理查询条件下取得了显著效果。该目标检测漏洞的发现将为后续模型安全防护研究提供支撑。