Large language models (LLMs) face significant copyright and intellectual property challenges as the cost of training increases and model reuse becomes prevalent. While watermarking techniques have been proposed to protect model ownership, they may not be robust to continue training and development, posing serious threats to model attribution and copyright protection. This work introduces a simple yet effective approach for robust LLM fingerprinting based on intrinsic model characteristics. We discover that the standard deviation distributions of attention parameter matrices across different layers exhibit distinctive patterns that remain stable even after extensive continued training. These parameter distribution signatures serve as robust fingerprints that can reliably identify model lineage and detect potential copyright infringement. Our experimental validation across multiple model families demonstrates the effectiveness of our method for model authentication. Notably, our investigation uncovers evidence that a recently Pangu Pro MoE model released by Huawei is derived from Qwen-2.5 14B model through upcycling techniques rather than training from scratch, highlighting potential cases of model plagiarism, copyright violation, and information fabrication. These findings underscore the critical importance of developing robust fingerprinting methods for protecting intellectual property in large-scale model development and emphasize that deliberate continued training alone is insufficient to completely obscure model origins.
翻译:大型语言模型(LLMs)随着训练成本的增加和模型复用的普及,面临着严重的版权和知识产权挑战。尽管已有水印技术用于保护模型所有权,但这些技术可能无法抵御持续的模型训练与迭代,从而对模型归属认定和版权保护构成严重威胁。本文提出一种基于模型内在特征的简单而有效的鲁棒大语言模型指纹识别方法。我们发现,不同层注意力参数矩阵的标准差分布呈现出独特的模式,即使经过大量持续训练后仍保持稳定。这些参数分布特征可作为可靠的指纹,用于识别模型血统并检测潜在的版权侵权行为。我们在多个模型家族上进行的实验验证表明了该方法在模型认证中的有效性。值得注意的是,我们的研究发现了证据表明,华为近期发布的Pangu Pro MoE模型是通过升级循环技术从Qwen-2.5 14B模型衍生而来,而非从头训练,这凸显了潜在的模型抄袭、版权侵权和信息伪造案例。这些发现强调了在大规模模型开发中开发稳健指纹识别方法对保护知识产权的关键重要性,并表明仅依靠刻意持续的训练不足以完全掩盖模型来源。