The impressive performances of large language models (LLMs) and their immense potential for commercialization have given rise to serious concerns over the intellectual property (IP) of their training data. In particular, the synthetic texts generated by LLMs may infringe the IP of the data being used to train the LLMs. To this end, it is imperative to be able to (a) identify the data provider who contributed to the generation of a synthetic text by an LLM (source attribution) and (b) verify whether the text data from a data provider has been used to train an LLM (data provenance). In this paper, we show that both problems can be solved by watermarking, i.e., by enabling an LLM to generate synthetic texts with embedded watermarks that contain information about their source(s). We identify the key properties of such watermarking frameworks (e.g., source attribution accuracy, robustness against adversaries), and propose a WAtermarking for Source Attribution (WASA) framework that satisfies these key properties due to our algorithmic designs. Our WASA framework enables an LLM to learn an accurate mapping from the texts of different data providers to their corresponding unique watermarks, which sets the foundation for effective source attribution (and hence data provenance). Extensive empirical evaluations show that our WASA framework achieves effective source attribution and data provenance.
翻译:大型语言模型(LLM)的卓越表现及其巨大的商业化潜力引发了对其训练数据知识产权(IP)的严重关切。特别是,LLM生成的合成文本可能侵犯用于训练LLM的数据的知识产权。为此,必须能够(a)识别对LLM生成合成文本有贡献的数据提供者(来源归因),以及(b)验证来自数据提供者的文本数据是否被用于训练LLM(数据溯源)。在本文中,我们证明这两个问题都可以通过水印来解决,即通过让LLM生成嵌入水印的合成文本,这些水印包含其来源信息。我们确定了此类水印框架的关键属性(例如,来源归因准确性、对抗鲁棒性),并提出了一种满足这些关键属性的基于水印的来源归因(WASA)框架,该框架通过我们的算法设计实现了这些属性。我们的WASA框架使LLM能够学习从不同数据提供者的文本到其对应唯一水印的精确映射,这为有效的来源归因(进而实现数据溯源)奠定了基础。广泛的实证评估表明,我们的WASA框架实现了有效的来源归因和数据溯源。