Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation

The substantial investment required to develop Large Language Models (LLMs) makes them valuable intellectual property, raising significant concerns about copyright protection. LLM fingerprinting has emerged as a key technique to address this, which aims to verify a model's origin by extracting an intrinsic, unique signature (a "fingerprint") and comparing it to that of a source model to identify illicit copies. However, existing black-box fingerprinting methods often fail to generate distinctive LLM fingerprints. This ineffectiveness arises because black-box methods typically rely on model outputs, which lose critical information about the model's unique parameters due to the usage of non-linear functions. To address this, we first leverage Fisher Information Theory to formally demonstrate that the gradient of the model's input is a more informative feature for fingerprinting than the output. Based on this insight, we propose ZeroPrint, a novel method that approximates these information-rich gradients in a black-box setting using zeroth-order estimation. ZeroPrint overcomes the challenge of applying this to discrete text by simulating input perturbations via semantic-preserving word substitutions. This operation allows ZeroPrint to estimate the model's Jacobian matrix as a unique fingerprint. Experiments on the standard benchmark show ZeroPrint achieves a state-of-the-art effectiveness and robustness, significantly outperforming existing black-box methods.

翻译：开发大语言模型所需的大量投入使其成为具有重要价值的知识产权，引发了关于版权保护的严重关切。大语言模型指纹识别技术应运而生，旨在通过提取模型内在的独特签名（即"指纹"）并与源模型进行比对，以识别非法复制模型，从而验证模型来源。然而，现有的黑盒指纹识别方法往往无法生成具有区分度的大语言模型指纹。这种失效源于黑盒方法通常依赖模型输出，而由于非线性函数的使用，输出丢失了模型独特参数的关键信息。为解决此问题，我们首先基于费希尔信息理论严格证明：相较于模型输出，模型输入的梯度是更具信息量的指纹特征。基于这一发现，我们提出ZeroPrint——一种通过零阶估计在黑盒场景下近似这些信息丰富梯度的创新方法。ZeroPrint通过语义保持的词语替换模拟输入扰动，克服了该方法在离散文本上应用的挑战。该操作使ZeroPrint能够将模型的雅可比矩阵估计为独特指纹。在标准基准测试上的实验表明，ZeroPrint实现了最先进的有效性与鲁棒性，显著优于现有黑盒方法。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

跨越黑盒：大语言模型的理论与机制

专知会员服务

37+阅读 · 1月7日

《面向空军的知识图谱即解决方案：领域知识有效融入大语言模型》

专知会员服务

56+阅读 · 2025年11月8日

142页DeepSeek-R1 思维链技术：让我们一起<思考>大语言模型（LLM）的推理能力

专知会员服务

48+阅读 · 2025年4月12日

如何将领域知识注入大模型？最新《将领域特定知识注入大语言模型》综述

专知会员服务

79+阅读 · 2025年2月24日