Large Language Models (LLMs) have demonstrated superior performance in various natural language processing tasks. Meanwhile, they require extensive training data, raising concerns related to dataset copyright protection. Backdoor-based watermarking is a viable approach to protect the copyright of classification datasets. However, these methods may introduce malicious misclassification behaviors into watermarked LLMs by attackers and also affect the semantic information of the watermarked text. To address these issues, we propose FunctionMarker, a novel copyright protection method for language datasets via knowledge injection. FunctionMarker enables LLMs to learn specific knowledge through fine-tuning on watermarked datasets, and we can extract the embedded watermark by obtaining the responses of LLMs to specific knowledge-related queries. Considering watermark capacity and stealthness, we select customizable functions as specific knowledge for LLMs to learn and embed the watermark into them. Moreover, FunctionMarker can embed multi-bit watermarks while preserving the original semantic information, thereby increasing the difficulty of adaptive attacks. We take mathematical functions as an instance to evaluate the effectiveness of FunctionMarker, and experiments show that only 0.3% of watermarked text achieves a 90% watermark extraction accuracy in most cases, validating our method's effectiveness.
翻译:大型语言模型(LLMs)在各类自然语言处理任务中展现出卓越性能。然而,它们需要海量训练数据,由此引发数据集版权保护方面的担忧。基于后门的水印标记是保护分类数据集版权的可行方法,但这些方法可能使攻击者向带水印的LLMs中植入恶意的错误分类行为,同时影响带水印文本的语义信息。为解决上述问题,我们提出FunctionMarker——一种通过知识注入实现语言数据集版权保护的新型方法。FunctionMarker通过使LLMs在带水印数据集上进行微调来学习特定知识,并通过获取模型对特定知识相关查询的响应提取嵌入的水印。考虑到水印容量与隐蔽性,我们选择可定制函数作为LLMs学习的特定知识,并将水印嵌入其中。此外,FunctionMarker能在保留原始语义信息的同时嵌入多比特水印,从而提升自适应攻击的难度。我们以数学函数为例验证FunctionMarker的有效性,实验表明仅需0.3%的带水印文本即可在多数情况下达到90%的水印提取准确率,证实了该方法的有效性。