Queries, Representation & Detection: The Next 100 Model Fingerprinting Schemes

The deployment of machine learning models in operational contexts represents a significant investment for any organisation. Consequently, the risk of these models being misappropriated by competitors needs to be addressed. In recent years, numerous proposals have been put forth to detect instances of model stealing. However, these proposals operate under implicit and disparate data and model access assumptions; as a consequence, it remains unclear how they can be effectively compared to one another. Our evaluation shows that a simple baseline that we introduce performs on par with existing state-of-the-art fingerprints, which, on the other hand, are much more complex. To uncover the reasons behind this intriguing result, this paper introduces a systematic approach to both the creation of model fingerprinting schemes and their evaluation benchmarks. By dividing model fingerprinting into three core components -- Query, Representation and Detection (QuRD) -- we are able to identify $\sim100$ previously unexplored QuRD combinations and gain insights into their performance. Finally, we introduce a set of metrics to compare and guide the creation of more representative model stealing detection benchmarks. Our approach reveals the need for more challenging benchmarks and a sound comparison with baselines. To foster the creation of new fingerprinting schemes and benchmarks, we open-source our fingerprinting toolbox.

翻译：机器学习模型在运营环境中的部署对任何组织而言都是一项重大投资。因此，这些模型被竞争对手盗用的风险亟需解决。近年来，已有大量方案被提出用于检测模型窃取行为。然而，这些方案基于隐含且各异的数据与模型访问假设运行；因此，如何有效比较它们仍不明确。我们的评估表明，我们引入的一个简单基线方法与现有最先进的指纹方案性能相当，而后者则复杂得多。为揭示这一有趣结果背后的原因，本文提出了一种系统化方法，用于创建模型指纹方案及其评估基准。通过将模型指纹技术分解为三个核心组件——查询、表征与检测（QuRD）——我们得以识别约100种先前未被探索的QuRD组合，并深入理解其性能表现。最后，我们引入一组指标来比较和指导更具代表性的模型窃取检测基准的构建。我们的方法揭示了创建更具挑战性的基准以及与基线进行合理比较的必要性。为促进新指纹方案与基准的创建，我们开源了我们的指纹工具箱。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日