Large language models (LLMs) are considered valuable Intellectual Properties (IP) for legitimate owners due to the enormous computational cost of training. It is crucial to protect the IP of LLMs from malicious stealing or unauthorized deployment. Despite existing efforts in watermarking and fingerprinting LLMs, these methods either impact the text generation process or are limited in white-box access to the suspect model, making them impractical. Hence, we propose DuFFin, a novel $\textbf{Du}$al-Level $\textbf{Fin}$gerprinting $\textbf{F}$ramework for black-box setting ownership verification. DuFFin extracts the trigger pattern and the knowledge-level fingerprints to identify the source of a suspect model. We conduct experiments on a variety of models collected from the open-source website, including four popular base models as protected LLMs and their fine-tuning, quantization, and safety alignment versions, which are released by large companies, start-ups, and individual users. Results show that our method can accurately verify the copyright of the base protected LLM on their model variants, achieving the IP-ROC metric greater than 0.95. Our code is available at https://github.com/yuliangyan0807/llm-fingerprint.
翻译:大语言模型(LLM)因其高昂的训练计算成本,被视为合法所有者的重要知识产权(IP)。保护LLM的知识产权免受恶意窃取或未经授权的部署至关重要。尽管已有针对LLM的水印和指纹技术,但这些方法要么会影响文本生成过程,要么仅限于对可疑模型的白盒访问,使其在实际应用中存在局限。为此,我们提出DuFFin,一种新颖的用于黑盒场景所有权验证的$\textbf{双}$层$\textbf{指}$纹$\textbf{框}$架。DuFFin通过提取触发模式和知识层指纹来识别可疑模型的来源。我们在从开源网站收集的多种模型上进行了实验,包括作为受保护LLM的四种流行基础模型,以及由大型公司、初创企业和个人用户发布的其微调、量化和安全对齐版本。结果表明,我们的方法能够准确验证基础受保护LLM在其模型变体上的版权,其IP-ROC指标大于0.95。我们的代码可在https://github.com/yuliangyan0807/llm-fingerprint获取。