Inhibitory Attacks on Backdoor-based Fingerprinting for Large Language Models

The widespread adoption of Large Language Model (LLM) in commercial and research settings has intensified the need for robust intellectual property protection. Backdoor-based LLM fingerprinting has emerged as a promising solution for this challenge. In practical application, the low-cost multi-model collaborative technique, LLM ensemble, combines diverse LLMs to leverage their complementary strengths, garnering significant attention and practical adoption. Unfortunately, the vulnerability of existing LLM fingerprinting for the ensemble scenario is unexplored. In order to comprehensively assess the robustness of LLM fingerprinting, in this paper, we propose two novel fingerprinting attack methods: token filter attack (TFA) and sentence verification attack (SVA). The TFA gets the next token from a unified set of tokens created by the token filter mechanism at each decoding step. The SVA filters out fingerprint responses through a sentence verification mechanism based on perplexity and voting. Experimentally, the proposed methods effectively inhibit the fingerprint response while maintaining ensemble performance. Compared with state-of-the-art attack methods, the proposed method can achieve better performance. The findings necessitate enhanced robustness in LLM fingerprinting.

翻译：大型语言模型（LLM）在商业和研究领域的广泛应用，加强了对稳健知识产权保护的需求。基于后门的LLM指纹识别技术已成为应对这一挑战的一种有前景的解决方案。在实际应用中，低成本的多模型协同技术——LLM集成，通过结合多种LLM以利用其互补优势，获得了广泛关注和实际应用。然而，现有LLM指纹识别技术在集成场景下的脆弱性尚未得到充分探究。为全面评估LLM指纹识别的鲁棒性，本文提出了两种新颖的指纹攻击方法：令牌过滤攻击（TFA）和语句验证攻击（SVA）。TFA在每一步解码过程中，从一个由令牌过滤机制创建的统一令牌集合中获取下一个令牌。SVA则通过基于困惑度和投票机制的语句验证机制来过滤掉指纹响应。实验表明，所提方法在保持集成性能的同时，能有效抑制指纹响应。与现有先进攻击方法相比，所提方法能取得更优的性能。这些发现表明，有必要增强LLM指纹识别技术的鲁棒性。

相关内容

指纹识别

关注 134

指纹识别即指通过比较不同指纹的细节特征点来进行鉴别。指纹识别技术涉及图像处理、模式识别、计算机视觉、数学形态学、小波分析等众多学科。由于每个人的指纹不同，就是同一人的十指之间，指纹也有明显区别，因此指纹可用于身份鉴定。由于每次捺印的方位不完全一样，着力点不同会带来不同程度的变形，又存在大量模糊指纹，如何正确提取特征和实现正确匹配，是指纹识别技术的关键。

【CMU博士论文】大型语言模型的隐性特性

专知会员服务

15+阅读 · 2025年10月18日

《美军使用大语言模型技术生成领域特定文档》2025最新379页

专知会员服务

50+阅读 · 2025年10月14日

【CVPR2025】BadToken：针对多模态大语言模型的词元级后门攻击

专知会员服务

10+阅读 · 2025年3月22日

【新书】大规模语言模型的隐私与安全，

专知会员服务

29+阅读 · 2024年12月4日