Protecting the intellectual property of open-source Large Language Models (LLMs) is very important, because training LLMs costs extensive computational resources and data. Therefore, model owners and third parties need to identify whether a suspect model is a subsequent development of the victim model. To this end, we propose a training-free REEF to identify the relationship between the suspect and victim models from the perspective of LLMs' feature representations. Specifically, REEF computes and compares the centered kernel alignment similarity between the representations of a suspect model and a victim model on the same samples. This training-free REEF does not impair the model's general capabilities and is robust to sequential fine-tuning, pruning, model merging, and permutations. In this way, REEF provides a simple and effective way for third parties and models' owners to protect LLMs' intellectual property together. The code is available at https://github.com/tmylla/REEF.
翻译:保护开源大语言模型(LLMs)的知识产权至关重要,因为训练LLMs需要消耗大量的计算资源和数据。因此,模型所有者和第三方需要识别可疑模型是否为受害模型的后续衍生版本。为此,我们提出了一种无需训练的REEF方法,从LLMs特征表征的角度识别可疑模型与受害模型之间的关系。具体而言,REEF计算并比较可疑模型与受害模型在同一样本上表征的中心核对齐相似度。这种无需训练的REEF方法不会损害模型的通用能力,且对序列微调、剪枝、模型融合及参数置换具有鲁棒性。通过这种方式,REEF为第三方和模型所有者共同保护LLMs知识产权提供了一种简单有效的途径。代码发布于https://github.com/tmylla/REEF。