The extraordinary performance of large language models (LLMs) heightens the importance of detecting whether the context is generated by an AI system. More importantly, while more and more companies and institutions release their LLMs, the origin can be hard to trace. Since LLMs are heading towards the time of AGI, similar to the origin tracing in anthropology, it is of great importance to trace the origin of LLMs. In this paper, we first raise the concern of the origin tracing of LLMs and propose an effective method to trace and detect AI-generated contexts. We introduce a novel algorithm that leverages the contrastive features between LLMs and extracts model-wise features to trace the text origins. Our proposed method works under both white-box and black-box settings therefore can be widely generalized to detect various LLMs.(e.g. can be generalized to detect GPT-3 models without the GPT-3 models). Also, our proposed method requires only limited data compared with the supervised learning methods and can be extended to trace new-coming model origins. We construct extensive experiments to examine whether we can trace the origins of given texts. We provide valuable observations based on the experimental results, such as the difficulty level of AI origin tracing, and the AI origin similarities, and call for ethical concerns of LLM providers. We are releasing all codes and data as a toolkit and benchmark for future AI origin tracing and detecting studies. \footnote{We are releasing all available resource at \url{https://github.com/OpenLMLab/}.}
翻译:大语言模型(LLMs)的卓越性能使得检测文本是否由AI系统生成变得愈发重要。更重要的是,随着越来越多的公司和机构发布其大语言模型,其来源变得难以追溯。由于大语言模型正朝着通用人工智能时代迈进,类似于人类学中的溯源研究,追溯大语言模型的来源具有重要意义。本文首次提出大语言模型溯源问题,并提出一种有效的方法来追溯和检测AI生成的文本。我们引入了一种新颖的算法,利用大语言模型之间的对比特征,提取模型级特征以追溯文本来源。该方法在白盒和黑盒设置下均有效,因此可广泛推广用于检测各种大语言模型(例如,无需GPT-3模型即可泛化检测GPT-3模型)。此外,与监督学习方法相比,我们的方法仅需有限数据,并可扩展至追踪新出现的模型来源。我们构建了大量实验来验证能否追溯给定文本的来源。基于实验结果,我们提供了有价值的观察,例如AI溯源难度、AI来源相似性等,并呼吁大语言模型提供商关注伦理问题。我们公开发布所有代码和数据,作为未来AI溯源与检测研究的工具包和基准。\footnote{所有资源详见\url{https://github.com/OpenLMLab/}}