The wide applicability and adaptability of generative large language models (LLMs) has enabled their rapid adoption. While the pre-trained models can perform many tasks, such models are often fine-tuned to improve their performance on various downstream applications. However, this leads to issues over violation of model licenses, model theft, and copyright infringement. Moreover, recent advances show that generative technology is capable of producing harmful content which exacerbates the problems of accountability within model supply chains. Thus, we need a method to investigate how a model was trained or a piece of text was generated and what their pre-trained base model was. In this paper we take the first step to address this open problem by tracing back the origin of a given fine-tuned LLM to its corresponding pre-trained base model. We consider different knowledge levels and attribution strategies, and find that we can correctly trace back 8 out of the 10 fine tuned models with our best method.
翻译:生成式大语言模型的广泛应用和适应性已促使其快速普及。虽然预训练模型能执行许多任务,但这类模型通常会被精调以提升其在下游各类应用中的性能。然而,这引发了违反模型许可、模型盗用及版权侵权等问题。此外,最新进展表明生成式技术能够产生有害内容,这加剧了模型供应链中的责任归属难题。因此,我们需要一种方法调查模型的训练方式或文本的生成过程,以及其预训练基础模型为何。本文首次尝试通过将给定精调LLM追溯至其对应的预训练基础模型来解决这一开放性问题。我们考虑了不同知识层次和归因策略,发现采用最佳方法能正确追溯10个精调模型中的8个。