Fingerprinting large language models (LLMs) is essential for verifying model ownership, ensuring authenticity, and preventing misuse. Traditional fingerprinting methods often require significant computational overhead or white-box verification access. In this paper, we introduce UTF, a novel and efficient approach to fingerprinting LLMs by leveraging under-trained tokens. Under-trained tokens are tokens that the model has not fully learned during its training phase. By utilizing these tokens, we perform supervised fine-tuning to embed specific input-output pairs into the model. This process allows the LLM to produce predetermined outputs when presented with certain inputs, effectively embedding a unique fingerprint. Our method has minimal overhead and impact on model's performance, and does not require white-box access to target model's ownership identification. Compared to existing fingerprinting methods, UTF is also more effective and robust to fine-tuning and random guess.
翻译:大语言模型(LLM)指纹识别对于验证模型所有权、确保真实性及防止滥用至关重要。传统指纹识别方法通常需要大量计算开销或白盒验证权限。本文提出UTF,一种利用欠训练令牌进行LLM指纹识别的新颖高效方法。欠训练令牌指模型在训练阶段未能充分学习的令牌。通过利用这些令牌,我们进行监督微调,将特定的输入-输出对嵌入模型中。该过程使得LLM在面对特定输入时能产生预设输出,从而有效嵌入唯一指纹。我们的方法对模型性能影响极小且开销最低,且无需白盒访问即可实现目标模型的所有权识别。与现有指纹识别方法相比,UTF对微调和随机猜测具有更强的有效性和鲁棒性。