A Little Leak Will Sink a Great Ship: Survey of Transparency for Large Language Models from Start to Finish

Large Language Models (LLMs) are trained on massive web-crawled corpora. This poses risks of leakage, including personal information, copyrighted texts, and benchmark datasets. Such leakage leads to undermining human trust in AI due to potential unauthorized generation of content or overestimation of performance. We establish the following three criteria concerning the leakage issues: (1) leakage rate: the proportion of leaked data in training data, (2) output rate: the ease of generating leaked data, and (3) detection rate: the detection performance of leaked versus non-leaked data. Despite the leakage rate being the origin of data leakage issues, it is not understood how it affects the output rate and detection rate. In this paper, we conduct an experimental survey to elucidate the relationship between the leakage rate and both the output rate and detection rate for personal information, copyrighted texts, and benchmark data. Additionally, we propose a self-detection approach that uses few-shot learning in which LLMs detect whether instances are present or absent in their training data, in contrast to previous methods that do not employ explicit learning. To explore the ease of generating leaked information, we create a dataset of prompts designed to elicit personal information, copyrighted text, and benchmarks from LLMs. Our experiments reveal that LLMs produce leaked information in most cases despite less such data in their training set. This indicates even small amounts of leaked data can greatly affect outputs. Our self-detection method showed superior performance compared to existing detection methods.

翻译：大型语言模型在海量网络爬取语料上进行训练，这带来了信息泄露风险，包括个人信息、受版权保护的文本以及基准数据集。此类泄露可能因未经授权的内容生成或性能高估而削弱人类对人工智能的信任。我们针对泄露问题建立以下三项准则：（1）泄露率：训练数据中泄露数据的比例；（2）输出率：生成泄露数据的难易程度；（3）检测率：检测泄露与非泄露数据的性能。尽管泄露率是数据泄露问题的根源，但其对输出率和检测率的影响尚不明确。本文通过实验性综述，阐明泄露率与个人信息、受版权保护文本及基准数据这三类数据的输出率与检测率之间的关系。此外，我们提出一种基于少样本学习的自检测方法，通过让大型语言模型识别实例是否存在于其训练数据中，这与以往未采用显式学习的方法形成对比。为探究泄露信息生成的难易程度，我们创建了一个包含提示的数据集，旨在诱导模型生成个人信息、受版权保护的文本和基准数据。实验表明，尽管训练集中泄露数据较少，但在大多数情况下模型仍会生成泄露信息——这表明即使少量泄露数据也会显著影响输出。我们的自检测方法在性能上优于现有检测方法。