Large Language Models (LLMs) have demonstrated superior performance in various natural language processing tasks. Meanwhile, they require extensive training data, raising concerns related to dataset copyright protection. Backdoor-based watermarking is a viable approach to protect the copyright of classification datasets. However, these methods may introduce malicious misclassification behaviors into watermarked LLMs by attackers and also affect the semantic information of the watermarked text. To address these issues, we propose FunctionMarker, a novel copyright protection method for language datasets via knowledge injection. FunctionMarker enables LLMs to learn specific knowledge through fine-tuning on watermarked datasets, and we can extract the embedded watermark by obtaining the responses of LLMs to specific knowledge-related queries. Considering watermark capacity and stealthness, we select customizable functions as specific knowledge for LLMs to learn and embed the watermark into them. Moreover, FunctionMarker can embed multi-bit watermarks while preserving the original semantic information, thereby increasing the difficulty of adaptive attacks. We take mathematical functions as an instance to evaluate the effectiveness of FunctionMarker, and experiments show that only 0.3% of watermarked text achieves a 90% watermark extraction accuracy in most cases, validating our method's effectiveness.
翻译:大型语言模型在各种自然语言处理任务中展现出卓越性能。然而,其训练所需的海量数据引发了数据集版权保护问题。基于后门的水印技术是保护分类数据集版权的可行方法,但这类方法可能使攻击者向带水印的语言模型注入恶意分类错误行为,同时也会影响带水印文本的语义信息。针对上述问题,我们提出函数标记(FunctionMarker)——一种通过知识注入实现语言数据集版权保护的新方法。该方法通过在水印数据集上进行微调,使语言模型学习特定知识,并通过获取模型对特定知识相关查询的响应来提取嵌入水印。考虑到水印容量与隐蔽性,我们选择可定制函数作为语言模型需学习的特定知识,并将水印嵌入其中。此外,函数标记能在保留原始语义信息的同时嵌入多比特水印,从而增加自适应攻击的难度。我们以数学函数为例验证函数标记的有效性,实验表明:在大多数场景下,仅需0.3%的带水印文本即可实现90%的水印提取准确率,充分证明了本方法的有效性。