The rapid growth of large language models raises pressing concerns about intellectual property protection under black-box deployment. Existing backdoor-based fingerprints either rely on rare tokens -- leading to high-perplexity inputs susceptible to filtering -- or use fixed trigger-response mappings that are brittle to leakage and post-hoc adaptation. We propose \textsc{Dual-Layer Nested Fingerprinting} (DNF), a black-box method that embeds a hierarchical backdoor by coupling domain-specific stylistic cues with implicit semantic triggers. Across Mistral-7B, LLaMA-3-8B-Instruct, and Falcon3-7B-Instruct, DNF achieves perfect fingerprint activation while preserving downstream utility. Compared with existing methods, it uses lower-perplexity triggers, remains undetectable under fingerprint detection attacks, and is relatively robust to incremental fine-tuning and model merging. These results position DNF as a practical, stealthy, and resilient solution for LLM ownership verification and intellectual property protection.
翻译:大语言模型的快速发展引发了在黑盒部署场景下知识产权保护的迫切关注。现有的基于后门的指纹方法要么依赖罕见词元——导致生成高困惑度的输入易被过滤——要么使用固定的触发-响应映射,这种映射对信息泄露和事后适应具有脆弱性。我们提出\textsc{双层嵌套指纹}(DNF),这是一种黑盒方法,通过将领域特定的风格线索与隐式语义触发器相耦合,嵌入一个层次化的后门。在Mistral-7B、LLaMA-3-8B-Instruct和Falcon3-7B-Instruct模型上的实验表明,DNF在保持下游任务效用的同时,实现了完美的指纹激活。与现有方法相比,它使用更低困惑度的触发器,在指纹检测攻击下保持不可探测性,并且对增量微调和模型合并具有相对鲁棒性。这些结果使DNF成为一种实用、隐蔽且具有韧性的解决方案,适用于大语言模型所有权验证与知识产权保护。