Fingerprinting LLMs via Prompt Injection

Large language models (LLMs) are often modified after release through post-processing such as post-training or quantization, which makes it challenging to determine whether one model is derived from another. Existing provenance detection methods have two main limitations: (1) they embed signals into the base model before release, which is infeasible for already published models, or (2) they compare outputs across models using hand-crafted or random prompts, which are not robust to post-processing. In this work, we propose LLMPrint, a novel detection framework that constructs fingerprints by exploiting LLMs' inherent vulnerability to prompt injection. Our key insight is that by optimizing fingerprint prompts to enforce consistent token preferences, we can obtain fingerprints that are both unique to the base model and robust to post-processing. We further develop a unified verification procedure that applies to both gray-box and black-box settings, with statistical guarantees. We evaluate LLMPrint on five base models and around 700 post-trained or quantized variants. Our results show that LLMPrint achieves high true positive rates while keeping false positive rates near zero. The code is publicly available at https://github.com/hifi-hyp/ACL-LLMPrint.

翻译：大型语言模型（LLM）在发布后常通过后训练或量化等后处理方式进行修改，这使得判断一个模型是否源自另一个模型变得困难。现有源检测方法存在两个主要局限：(1) 需在发布前将信号嵌入基础模型，这对已发布模型不可行；(2) 通过人工构造或随机提示比较模型输出，此类方法对后处理不鲁棒。本研究提出LLMPrint——一种利用LLM对提示注入固有脆弱性来构建指纹的新型检测框架。我们的核心洞察是：通过优化指纹提示以强制实现一致的令牌偏好，可获得既具有基础模型独特性又对后处理鲁棒的指纹。我们进一步开发了统一的验证流程，适用于灰盒与黑盒两种设置，并具备统计保障。我们在五个基础模型及约700个后训练或量化变体上评估了LLMPrint。结果表明，LLMPrint在保持假阳性率接近于零的同时达到了高真阳性率。代码开源于https://github.com/hifi-hyp/ACL-LLMPrint。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【EMNLP2025】ReCode：基于细粒度检索增强生成的LLM代码修复方法

专知会员服务

10+阅读 · 2025年9月3日

LLM后训练：深入探讨推理大语言模型

专知会员服务

40+阅读 · 2025年3月3日

带入您自己的知识：大型语言模型（LLM）知识扩展方法综述

专知会员服务

38+阅读 · 2025年2月21日

揭示生成式人工智能 / 大型语言模型（LLMs）的军事潜力

专知会员服务

32+阅读 · 2024年9月26日