Large Language Models (LLMs) utilize extensive knowledge databases and show powerful text generation ability. However, their reliance on high-quality copyrighted datasets raises concerns about copyright infringements in generated texts. Current research often employs prompt engineering or semantic classifiers to identify copyrighted content, but these approaches have two significant limitations: (1) Challenging to identify which specific subdataset (e.g., works from particular authors) influences an LLM's output. (2) Treating the entire training database as copyrighted, hence overlooking the inclusion of non-copyrighted training data. We propose Inner-Probe, a lightweight framework designed to evaluate the influence of copyrighted sub-datasets on LLM-generated texts. Unlike traditional methods relying solely on text, we discover that the results of multi-head attention (MHA) during LLM output generation provide more effective information. Thus, Inner-Probe performs sub-dataset contribution analysis using a lightweight LSTM based network trained on MHA results in a supervised manner. Harnessing such a prior, Inner-Probe enables non-copyrighted text detection through a concatenated global projector trained with unsupervised contrastive learning. Inner-Probe demonstrates 3x improved efficiency compared to semantic model training in sub-dataset contribution analysis on Books3, achieves 15.04% - 58.7% higher accuracy over baselines on the Pile, and delivers a 0.104 increase in AUC for non-copyrighted data filtering.
翻译:大语言模型(LLMs)利用广泛的知识库并展现出强大的文本生成能力。然而,其对高质量版权数据集的依赖引发了生成文本中版权侵权的担忧。当前研究通常采用提示工程或语义分类器来识别受版权保护的内容,但这些方法存在两个显著局限:(1)难以确定具体哪个子数据集(例如特定作者的作品)影响了LLM的输出。(2)将整个训练数据库视为受版权保护,从而忽略了非版权训练数据的纳入。我们提出了Inner-Probe,一个轻量级框架,旨在评估受版权保护的子数据集对LLM生成文本的影响。与仅依赖文本的传统方法不同,我们发现LLM输出生成过程中的多头注意力(MHA)结果提供了更有效的信息。因此,Inner-Probe使用一个基于轻量级LSTM的网络,以监督方式在MHA结果上进行训练,以执行子数据集贡献分析。利用这一先验知识,Inner-Probe通过一个采用无监督对比学习训练的级联全局投影器,实现了非版权文本检测。在Books3上的子数据集贡献分析中,Inner-Probe相比语义模型训练效率提高了3倍;在Pile数据集上,其准确率比基线方法高出15.04%至58.7%;在非版权数据过滤方面,AUC提升了0.104。