Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models

With large language models (LLMs) poised to become embedded in our daily lives, questions are starting to be raised about the dataset(s) they learned from. These questions range from potential bias or misinformation LLMs could retain from their training data to questions of copyright and fair use of human-generated text. However, while these questions emerge, developers of the recent state-of-the-art LLMs become increasingly reluctant to disclose details on their training corpus. We here introduce the task of document-level membership inference for real-world LLMs, i.e. inferring whether the LLM has seen a given document during training or not. First, we propose a procedure for the development and evaluation of document-level membership inference for LLMs by leveraging commonly used data sources for training and the model release date. We then propose a practical, black-box method to predict document-level membership and instantiate it on OpenLLaMA-7B with both books and academic papers. We show our methodology to perform very well, reaching an impressive AUC of 0.856 for books and 0.678 for papers. We then show our approach to outperform the sentence-level membership inference attacks used in the privacy literature for the document-level membership task. We finally evaluate whether smaller models might be less sensitive to document-level inference and show OpenLLaMA-3B to be approximately as sensitive as OpenLLaMA-7B to our approach. Taken together, our results show that accurate document-level membership can be inferred for LLMs, increasing the transparency of technology poised to change our lives.

翻译：大语言模型即将融入我们的日常生活，由此引发关于其训练数据集的一系列问题：从模型可能从训练数据中保留的潜在偏见或错误信息，到人类生成文本的版权与合理使用问题。然而，在相关问题日益凸显之际，最新一代大语言模型的开发者却愈发不愿公开其训练语料库的细节。本文提出面向真实世界大语言模型的文档级成员推理任务——即推断给定文档是否属于模型训练数据。首先，我们通过利用常用训练数据源及模型发布时间节点，设计了适用于大语言模型文档级成员推理的开发与评估流程。随后提出一种实用的黑盒方法用于预测文档级成员关系，并在OpenLLaMA-7B模型上针对书籍和学术论文两类语料进行实例化验证。实验表明，该方法表现优异：针对书籍的AUC值达0.856，论文为0.678。进一步对比显示，本文方法在文档级成员任务上优于隐私文献中常用的句子级成员推理攻击。最后，我们考察了较小模型对文档级推理的敏感性，发现OpenLLaMA-3B与OpenLLaMA-7B对该方法的敏感度基本相当。综合而言，我们的研究证实了大语言模型可实现精准的文档级成员推理，从而增强这项将改变人类生活的技术的透明度。