The substantial investment required to develop Large Language Models (LLMs) makes them valuable intellectual property, raising significant concerns about copyright protection. LLM fingerprinting has emerged as a key technique to address this, which aims to verify a model's origin by extracting an intrinsic, unique signature (a "fingerprint") and comparing it to that of a source model to identify illicit copies. However, existing black-box fingerprinting methods often fail to generate distinctive LLM fingerprints. This ineffectiveness arises because black-box methods typically rely on model outputs, which lose critical information about the model's unique parameters due to the usage of non-linear functions. To address this, we first leverage Fisher Information Theory to formally demonstrate that the gradient of the model's input is a more informative feature for fingerprinting than the output. Based on this insight, we propose ZeroPrint, a novel method that approximates these information-rich gradients in a black-box setting using zeroth-order estimation. ZeroPrint overcomes the challenge of applying this to discrete text by simulating input perturbations via semantic-preserving word substitutions. This operation allows ZeroPrint to estimate the model's Jacobian matrix as a unique fingerprint. Experiments on the standard benchmark show ZeroPrint achieves a state-of-the-art effectiveness and robustness, significantly outperforming existing black-box methods.
翻译:开发大语言模型所需的大量投入使其成为具有重要价值的知识产权,引发了关于版权保护的严重关切。大语言模型指纹识别技术应运而生,旨在通过提取模型内在的独特签名(即"指纹")并与源模型进行比对,以识别非法复制模型,从而验证模型来源。然而,现有的黑盒指纹识别方法往往无法生成具有区分度的大语言模型指纹。这种失效源于黑盒方法通常依赖模型输出,而由于非线性函数的使用,输出丢失了模型独特参数的关键信息。为解决此问题,我们首先基于费希尔信息理论严格证明:相较于模型输出,模型输入的梯度是更具信息量的指纹特征。基于这一发现,我们提出ZeroPrint——一种通过零阶估计在黑盒场景下近似这些信息丰富梯度的创新方法。ZeroPrint通过语义保持的词语替换模拟输入扰动,克服了该方法在离散文本上应用的挑战。该操作使ZeroPrint能够将模型的雅可比矩阵估计为独特指纹。在标准基准测试上的实验表明,ZeroPrint实现了最先进的有效性与鲁棒性,显著优于现有黑盒方法。