A Fingerprint for Large Language Models

Recent advances show that scaling a pre-trained language model could achieve state-of-the-art performance on many downstream tasks, prompting large language models (LLMs) to become a hot research topic in the field of artificial intelligence. However, due to the resource-intensive nature of training LLMs from scratch, it is urgent and crucial to protect the intellectual property of LLMs against infringement. This has motivated the authors in this paper to propose a novel black-box fingerprinting technique for LLMs, which requires neither model training nor model fine-tuning. We first demonstrate that the outputs of LLMs span a unique vector space associated with each model. We model the problem of ownership authentication as the task of evaluating the similarity between the victim model's space and the output's space of the suspect model. To deal with this problem, we propose two solutions, where the first solution involves verifying whether the outputs of the suspected large model are in the same space as those of the victim model, enabling rapid identification of model infringement, and the second one reconstructs the union of the vector spaces for LLM outputs and the victim model to address situations where the victim model has undergone the Parameter-Efficient Fine-Tuning (PEFT) attacks. Experimental results indicate that the proposed technique achieves superior performance in ownership verification and robustness against PEFT attacks. This work reveals inherent characteristics of LLMs and provides a promising solution for ownership verification of LLMs in black-box scenarios, ensuring efficiency, generality and practicality.

翻译：近期研究表明，扩展预训练语言模型的规模可在众多下游任务中实现最优性能，这促使大语言模型（LLMs）成为人工智能领域的研究热点。然而，由于从头训练LLMs需要耗费大量资源，保护LLMs的知识产权免遭侵权变得尤为迫切和关键。为此，本文作者提出一种新颖的LLMs黑盒指纹识别技术，该技术既不需要模型训练，也无需模型微调。我们首先证明LLMs的输出会张成一个与各模型相关联的独特向量空间。我们将所有权认证问题建模为评估受害模型空间与嫌疑模型输出空间之间相似度的任务。针对该问题，我们提出两种解决方案：第一种方案通过验证嫌疑大模型的输出是否与受害模型处于同一空间，以实现对模型侵权的快速识别；第二种方案则重构LLM输出与受害模型的向量空间并集，以应对受害模型遭受参数高效微调（PEFT）攻击的情形。实验结果表明，所提技术在所有权验证方面表现优异，并对PEFT攻击具有强鲁棒性。这项工作揭示了LLMs的内在特性，为黑盒场景下的LLMs所有权验证提供了一种高效、通用且实用的解决方案。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/