Combating Data Laundering in LLM Training

Data rights owners can detect unauthorized data use in large language model (LLM) training by querying with proprietary samples. Often, superior performance (e.g., higher confidence or lower loss) on a sample relative to the untrained data implies it was part of the training corpus, as LLMs tend to perform better on data they have seen during training. However, this detection becomes fragile under data laundering, a practice of transforming the stylistic form of proprietary data, while preserving critical information to obfuscate data provenance. When an LLM is trained exclusively on such laundered variants, it no longer performs better on originals, erasing the signals that standard detections rely on. We counter this by inferring the unknown laundering transformation from black-box access to the target LLM and, via an auxiliary LLM, synthesizing queries that mimic the laundered data, even if rights owners have only the originals. As the search space of finding true laundering transformations is infinite, we abstract such a process into a high-level transformation goal (e.g., "lyrical rewriting") and concrete details (e.g., "with vivid imagery"), and introduce synthesis data reversion (SDR) that instantiates this abstraction. SDR first identifies the most probable goal for synthesis to narrow the search; it then iteratively refines details so that synthesized queries gradually elicit stronger detection signals from the target LLM. Evaluated on the MIMIR benchmark against diverse laundering practices and target LLM families (Pythia, Llama2, and Falcon), SDR consistently strengthens data misuse detection, providing a practical countermeasure to data laundering.

翻译：数据权利所有者可通过提交专有样本查询，检测大语言模型（LLM）训练中是否存在未经授权的数据使用行为。通常，相较于未训练数据，模型对某样本表现更优（如置信度更高或损失更低）意味着该样本可能已纳入训练语料——因为LLM对训练阶段接触过的数据通常具有更好的处理能力。然而，这种检测机制在数据清洗（Data Laundering）面前显得脆弱：数据清洗通过转换专有数据的文体形式，在保留关键信息的同时掩盖数据来源。当LLM仅基于清洗变体进行训练时，模型对原始数据的优势表现便会消失，从而抹除标准检测方法所依赖的信号特征。我们提出的对策是：通过黑盒访问目标LLM来推断未知的清洗转换方式，并借助辅助LLM生成模拟清洗数据的合成查询——即便权利所有者仅持有原始数据。由于真实清洗转换的搜索空间具有无限性，我们将该过程抽象为高层级转换目标（如"抒情式改写"）与具体细节（如"使用生动意象"）的组合，并引入合成数据还原（SDR）方法实现该抽象框架。SDR首先识别最可能的合成目标以缩小搜索范围，随后通过迭代优化细节，使合成查询逐步在目标LLM中引发更强的检测信号。在MIMIR基准上针对多种清洗策略及目标LLM系列（Pythia、Llama2和Falcon）的评估表明，SDR能持续增强数据滥用检测能力，为对抗数据清洗提供了实用方案。