Establishing reliable and verifiable fingerprinting mechanisms is fundamental to controlling the unauthorized redistribution of large language models (LLMs). However, existing approaches face two major challenges: (a) ensuring imperceptibility, including resistance to statistical identification and avoidance of accidental activation during fingerprint construction, and (b) preserving both model utility and fingerprint detectability under subsequent model modifications. To address these challenges, we propose an end-to-end fingerprinting framework with two components. First, we design a rule-based code-mixing fingerprint (CF) that maps natural-query-like prompts to multi-candidate targets, reducing accidental triggering via high-complexity code-mixing formulations. Second, we introduce Multi-Candidate Editing (MCEdit), which jointly optimizes multi-candidate targets and enforces margins between target and non-target outputs to improve post-modification detectability. Extensive experiments demonstrate that our framework provides a robust and practical solution for fingerprinting LLMs.
翻译:建立可靠且可验证的指纹识别机制对于控制大语言模型(LLMs)的未授权再分发至关重要。然而,现有方法面临两大挑战:(a)确保不可感知性,包括抵抗统计识别以及在指纹构建过程中避免意外激活;(b)在后续模型修改下同时保持模型效用与指纹可检测性。为解决这些挑战,我们提出了一个包含两个组件的端到端指纹识别框架。首先,我们设计了一种基于规则的代码混合指纹(CF),该指纹将类自然查询提示映射到多候选目标,通过高复杂度的代码混合表述降低意外触发概率。其次,我们引入了多候选编辑方法(MCEdit),该方法联合优化多候选目标,并通过在目标输出与非目标输出之间强制设置边界来提升修改后的可检测性。大量实验表明,我们的框架为大语言模型指纹识别提供了鲁棒且实用的解决方案。