Questions of fair use of copyright-protected content to train Large Language Models (LLMs) are being very actively debated. Document-level inference has been proposed as a new task: inferring from black-box access to the trained model whether a piece of content has been seen during training. SOTA methods however rely on naturally occurring memorization of (part of) the content. While very effective against models that memorize a lot, we hypothesize--and later confirm--that they will not work against models that do not naturally memorize, e.g. medium-size 1B models. We here propose to use copyright traps, the inclusion of fictitious entries in original content, to detect the use of copyrighted materials in LLMs with a focus on models where memorization does not naturally occur. We carefully design an experimental setup, randomly inserting traps into original content (books) and train a 1.3B LLM. We first validate that the use of content in our target model would be undetectable using existing methods. We then show, contrary to intuition, that even medium-length trap sentences repeated a significant number of times (100) are not detectable using existing methods. However, we show that longer sequences repeated a large number of times can be reliably detected (AUC=0.75) and used as copyright traps. We further improve these results by studying how the number of times a sequence is seen improves detectability, how sequences with higher perplexity tend to be memorized more, and how taking context into account further improves detectability.
翻译:关于使用受版权保护内容训练大型语言模型(LLM)的合理使用问题正引发热烈讨论。文档级推理被提出作为一项新任务:通过黑盒访问训练模型,推断训练过程中是否见过某段内容。然而,现有最先进方法依赖内容(部分)的自然记忆现象。虽然此类方法对大量记忆的模型效果显著,但我们假设——并经后续验证——其对不易自然记忆的模型(如中等规模的1B参数模型)无效。为此,我们提出使用版权陷阱——在原始内容中嵌入虚构条目——来检测LLM对受版权素材的使用情况,尤其针对不易自然记忆的模型。我们精心设计实验方案,随机向原始内容(书籍)中插入陷阱,并训练1.3B参数的LLM。首先验证了若使用现有方法,无法检测到目标模型对相关内容的使用。随后证明,与直觉相反,即使中等长度的陷阱句子重复出现显著次数(100次),现有方法仍无法检测。但实验表明,长序列重复大量出现后(AUC=0.75)可被可靠检测并用作版权陷阱。我们进一步优化结果,研究了序列重复次数对可检测性的提升规律、高困惑度序列更易被记忆的现象,以及结合上下文可进一步提升检测性能。