Model stealing attacks, where adversaries create high-fidelity surrogate models, are a significant threat to the intellectual property of machine learning services. Conventional wisdom suggests these surrogates could provide adversaries with economic leverage comparable to the original service providers. This paper challenges this assumption by evaluating model stealing attacks beyond mere fidelity to the target model. Because query-based extraction provides only partial supervision of the target's input-output behavior, the surrogate is not uniquely identified: many near-optimal surrogates can achieve comparable fidelity while differing in deployment-relevant properties. Instead of performing a classic learning-based model stealing attack, we compute the Rashomon Set (i.e., the set of almost-equally-accurate models) of surrogate models, and evaluate its diversity using multiplicity metrics (ambiguity, discrepancy, and Rashomon Capacity) and group fairness metrics. Across tabular, medical imaging, and NLP tasks, our experiments on real-world datasets reveal that despite exhibiting similar fidelity to the target model, surrogate models can display significant variances in other critical performance metrics. These findings cast doubt on the presumed equivalence between high-fidelity surrogates and the target model in practical deployment scenarios.
翻译:模型窃取攻击(即攻击者构建高保真替代模型的行为)对机器学习服务的知识产权构成重大威胁。传统观点认为,这些替代模型可使攻击者获得与原始服务提供者相当的经济优势。本文通过评估超越目标模型保真度的模型窃取攻击,对这一假设提出质疑。由于基于查询的提取仅提供目标输入输出行为的部分监督,替代模型并非唯一确定:大量近优替代模型能在保持相似保真度的同时,展现出部署相关属性的显著差异。我们不采用经典的学习型模型窃取方法,而是计算替代模型的Rashomon集合(即精度几乎相等的模型集合),并利用多样性指标(模糊性、差异性与Rashomon容量)及群体公平性指标评估其多样性。在表格数据、医学影像及自然语言处理任务中,基于真实数据集的实验表明,尽管替代模型与目标模型在保真度上表现相似,但在其他关键性能指标上可能存在显著差异。这些发现对高保真替代模型在实际部署场景中与目标模型等价性的传统认知提出了质疑。