Recent advances show that scaling a pre-trained language model could achieve state-of-the-art performance on many downstream tasks, prompting large language models (LLMs) to become a hot research topic in the field of artificial intelligence. However, due to the resource-intensive nature of training LLMs from scratch, it is urgent and crucial to protect the intellectual property of LLMs against infringement. This has motivated the authors in this paper to propose a novel black-box fingerprinting technique for LLMs, which requires neither model training nor model fine-tuning. We first demonstrate that the outputs of LLMs span a unique vector space associated with each model. We model the problem of ownership authentication as the task of evaluating the similarity between the victim model's space and the output's space of the suspect model. To deal with this problem, we propose two solutions, where the first solution involves verifying whether the outputs of the suspected large model are in the same space as those of the victim model, enabling rapid identification of model infringement, and the second one reconstructs the union of the vector spaces for LLM outputs and the victim model to address situations where the victim model has undergone the Parameter-Efficient Fine-Tuning (PEFT) attacks. Experimental results indicate that the proposed technique achieves superior performance in ownership verification and robustness against PEFT attacks. This work reveals inherent characteristics of LLMs and provides a promising solution for ownership verification of LLMs in black-box scenarios, ensuring efficiency, generality and practicality.
翻译:近期研究表明,扩展预训练语言模型的规模可在众多下游任务中实现最优性能,这促使大语言模型(LLMs)成为人工智能领域的研究热点。然而,由于从头训练LLMs需要耗费大量资源,保护LLMs的知识产权免遭侵权变得尤为迫切和关键。为此,本文作者提出一种新颖的LLMs黑盒指纹识别技术,该技术既不需要模型训练,也无需模型微调。我们首先证明LLMs的输出会张成一个与各模型相关联的独特向量空间。我们将所有权认证问题建模为评估受害模型空间与嫌疑模型输出空间之间相似度的任务。针对该问题,我们提出两种解决方案:第一种方案通过验证嫌疑大模型的输出是否与受害模型处于同一空间,以实现对模型侵权的快速识别;第二种方案则重构LLM输出与受害模型的向量空间并集,以应对受害模型遭受参数高效微调(PEFT)攻击的情形。实验结果表明,所提技术在所有权验证方面表现优异,并对PEFT攻击具有强鲁棒性。这项工作揭示了LLMs的内在特性,为黑盒场景下的LLMs所有权验证提供了一种高效、通用且实用的解决方案。