The widespread adoption of Large Language Model (LLM) in commercial and research settings has intensified the need for robust intellectual property protection. Backdoor-based LLM fingerprinting has emerged as a promising solution for this challenge. In practical application, the low-cost multi-model collaborative technique, LLM ensemble, combines diverse LLMs to leverage their complementary strengths, garnering significant attention and practical adoption. Unfortunately, the vulnerability of existing LLM fingerprinting for the ensemble scenario is unexplored. In order to comprehensively assess the robustness of LLM fingerprinting, in this paper, we propose two novel fingerprinting attack methods: token filter attack (TFA) and sentence verification attack (SVA). The TFA gets the next token from a unified set of tokens created by the token filter mechanism at each decoding step. The SVA filters out fingerprint responses through a sentence verification mechanism based on perplexity and voting. Experimentally, the proposed methods effectively inhibit the fingerprint response while maintaining ensemble performance. Compared with state-of-the-art attack methods, the proposed method can achieve better performance. The findings necessitate enhanced robustness in LLM fingerprinting.
翻译:大型语言模型(LLM)在商业和研究领域的广泛应用,加强了对稳健知识产权保护的需求。基于后门的LLM指纹识别技术已成为应对这一挑战的一种有前景的解决方案。在实际应用中,低成本的多模型协同技术——LLM集成,通过结合多种LLM以利用其互补优势,获得了广泛关注和实际应用。然而,现有LLM指纹识别技术在集成场景下的脆弱性尚未得到充分探究。为全面评估LLM指纹识别的鲁棒性,本文提出了两种新颖的指纹攻击方法:令牌过滤攻击(TFA)和语句验证攻击(SVA)。TFA在每一步解码过程中,从一个由令牌过滤机制创建的统一令牌集合中获取下一个令牌。SVA则通过基于困惑度和投票机制的语句验证机制来过滤掉指纹响应。实验表明,所提方法在保持集成性能的同时,能有效抑制指纹响应。与现有先进攻击方法相比,所提方法能取得更优的性能。这些发现表明,有必要增强LLM指纹识别技术的鲁棒性。