The development of large language models (LLMs) is costly and has significant commercial value. Consequently, preventing unauthorized appropriation of open-source LLMs and protecting developers' intellectual property rights have become critical challenges. In this work, we propose the Functional Network Fingerprint (FNF), a training-free, sample-efficient method for detecting whether a suspect LLM is derived from a victim model, based on the consistency between their functional network activity. We demonstrate that models that share a common origin, even with differences in scale or architecture, exhibit highly consistent patterns of neuronal activity within their functional networks across diverse input samples. In contrast, models trained independently on distinct data or with different objectives fail to preserve such activity alignment. Unlike conventional approaches, our method requires only a few samples for verification, preserves model utility, and remains robust to common model modifications (such as fine-tuning, pruning, and parameter permutation), as well as to comparisons across diverse architectures and dimensionalities. FNF thus provides model owners and third parties with a simple, non-invasive, and effective tool for protecting LLM intellectual property. The code is available at https://github.com/WhatAboutMyStar/LLM_ACTIVATION.
翻译:大语言模型(LLM)的开发成本高昂且具有重要商业价值。因此,防止开源大语言模型被未经授权的挪用以及保护开发者的知识产权已成为关键挑战。在本工作中,我们提出功能网络指纹(FNF),这是一种无需训练、样本高效的检测方法,基于嫌疑模型与受害模型之间功能网络活动的一致性,来判断嫌疑模型是否源自受害模型。我们证明,具有共同起源的模型,即使规模或架构存在差异,在多样化的输入样本上,其功能网络内的神经元活动模式仍表现出高度一致性。相反,在独立数据上训练或具有不同目标的模型则无法保持这种活动对齐。与传统方法不同,我们的方法仅需少量样本即可进行验证,保持了模型效用,并且对常见的模型修改(如微调、剪枝和参数置换)以及跨不同架构和维度的比较均保持鲁棒性。因此,FNF为模型所有者和第三方提供了一个简单、非侵入且有效的工具,用于保护大语言模型的知识产权。代码发布于 https://github.com/WhatAboutMyStar/LLM_ACTIVATION。