Machine Learning (ML) models become vulnerable to Model Stealing Attacks (MSA) when they are deployed as a service. In such attacks, the deployed model is queried repeatedly to build a labelled dataset. This dataset allows the attacker to train a thief model that mimics the original model. To maximize query efficiency, the attacker has to select the most informative subset of data points from the pool of available data. Existing attack strategies utilize approaches like Active Learning and Semi-Supervised learning to minimize costs. However, in the black-box setting, these approaches may select sub-optimal samples as they train only one thief model. Depending on the thief model's capacity and the data it was pretrained on, the model might even select noisy samples that harm the learning process. In this work, we explore the usage of an ensemble of deep learning models as our thief model. We call our attack Army of Thieves(AOT) as we train multiple models with varying complexities to leverage the crowd's wisdom. Based on the ensemble's collective decision, uncertain samples are selected for querying, while the most confident samples are directly included in the training data. Our approach is the first one to utilize an ensemble of thief models to perform model extraction. We outperform the base approaches of existing state-of-the-art methods by at least 3% and achieve a 21% higher adversarial sample transferability than previous work for models trained on the CIFAR-10 dataset.
翻译:机器学习模型作为服务部署时易受模型窃取攻击。在此类攻击中,攻击者通过反复查询已部署模型来构建标注数据集,进而训练模仿原始模型的窃取模型。为最大化查询效率,攻击者需从可用数据池中筛选最具信息量的子集。现有攻击策略采用主动学习与半监督学习等方法以最小化成本。然而在黑盒场景下,这些方法由于仅训练单一窃取模型可能选取次优样本。受窃取模型容量及其预训练数据影响,模型甚至可能选择损害学习过程的噪声样本。本研究探索将深度学习模型集成作为窃取模型的方案,提出名为"盗贼大军(AOT)"的攻击方法——通过训练复杂度各异的多个模型来发挥群体智慧。基于集成模型的集体决策,我们选择不确定样本进行查询,同时将高置信度样本直接纳入训练数据。该方法是首个利用窃取模型集成执行模型提取的方案。在CIFAR-10数据集训练的模型上,本方法不仅以至少3%的优势超越现有最优方法的基础方案,更使对抗样本迁移率较先前工作提升21%。