A Training-free Method for LLM Text Attribution

Verifying the provenance of content is crucial to the functioning of many organizations, e.g., educational institutions, social media platforms, and firms. This problem is becoming increasingly challenging as text generated by Large Language Models (LLMs) becomes almost indistinguishable from human-generated content. In addition, many institutions use in-house LLMs and want to ensure that external, non-sanctioned LLMs do not produce content within their institutions. In this paper, we answer the following question: Given a piece of text, can we identify whether it was produced by a particular LLM, while ensuring a guaranteed low false positive rate? We model LLM text as a sequential stochastic process with complete dependence on history. We then design zero-shot statistical tests to (i) distinguish between text generated by two different known sets of LLMs $A$ (non-sanctioned) and $B$ (in-house), and (ii) identify whether text was generated by a known LLM or by any unknown model. We prove that the Type I and Type II errors of our test decrease exponentially with the length of the text. We also extend our theory to black-box access via sampling and characterize the required sample size to obtain essentially the same Type I and Type II error upper bounds as in the white-box setting (i.e., with access to $A$). We show the tightness of our upper bounds by providing an information-theoretic lower bound. We next present numerical experiments to validate our theoretical results and assess their robustness in settings with adversarial post-editing. Our work has a host of practical applications in which determining the origin of a text is important and can also be useful for combating misinformation and ensuring compliance with emerging AI regulations. See https://github.com/TaraRadvand74/llm-text-detection for code, data, and an online demo of the project.

翻译：验证内容的来源对许多机构（如教育机构、社交媒体平台和企业）的运作至关重要。随着大型语言模型（LLMs）生成的文本几乎与人类生成的内容难以区分，这一问题正变得日益严峻。此外，许多机构使用内部LLM，并希望确保外部未经批准的LLM不会在其机构内生成内容。在本文中，我们回答以下问题：给定一段文本，我们能否识别出它是否由特定的LLM生成，同时保证极低的假阳性率？我们将LLM文本建模为完全依赖历史的顺序随机过程。然后，我们设计零样本统计检验，以（i）区分由两个不同的已知LLM集合生成的内容，即集合$A$（未经批准）和集合$B$（内部），以及（ii）识别文本是否由已知LLM或任何未知模型生成。我们证明，检验的第一类错误和第二类错误的概率随文本长度呈指数级下降。我们还将理论扩展至通过采样实现的黑盒访问，并描述了获得与白盒设置（即对$A$的访问权限）基本相同的错误上界所需的样本量。通过提供信息论下界，我们展示了上界的紧致性。接下来，我们进行数值实验以验证理论结果，并评估其在对抗性后编辑场景下的鲁棒性。我们的工作具有一系列实际应用场景，其中确定文本来源至关重要，并且还可用于打击虚假信息和确保遵守新兴的人工智能法规。代码、数据和项目在线演示请参见 https://github.com/TaraRadvand74/llm-text-detection。