Protecting the intellectual property of large language models requires robust ownership verification. Conventional backdoor fingerprinting, however, is flawed by a stealth-robustness paradox: to be robust, these methods force models to memorize fixed responses to high-perplexity triggers, but this targeted overfitting creates detectable statistical artifacts. We resolve this paradox with KinGuard, a framework that embeds a private knowledge corpus built on structured kinship narratives. Instead of memorizing superficial triggers, the model internalizes this knowledge via incremental pre-training, and ownership is verified by probing its conceptual understanding. Extensive experiments demonstrate KinGuard's superior effectiveness, stealth, and resilience against a battery of attacks including fine-tuning, input perturbation, and model merging. Our work establishes knowledge-based embedding as a practical and secure paradigm for model fingerprinting.
翻译:保护大型语言模型的知识产权需要可靠的所有权验证方法。然而,传统的后门指纹识别技术存在隐蔽性与鲁棒性之间的悖论:为达到鲁棒性,这些方法强制模型记忆对高困惑度触发器的固定响应,但这种针对性过拟合会产生可检测的统计伪影。我们通过KinGuard框架解决了这一悖论,该框架嵌入了基于结构化亲缘关系叙事构建的私有知识语料库。模型通过增量预训练内化这些知识而非记忆表层触发器,并通过探测其概念理解能力来验证所有权。大量实验表明,KinGuard在面对微调、输入扰动和模型合并等多种攻击时,具有卓越的有效性、隐蔽性和抗干扰能力。本研究确立了基于知识嵌入的模型指纹识别作为一种实用且安全的范式。