EditMark：基于模型编辑的大型语言模型水印嵌入方法 (EditMark: Watermarking Large Language Models based on Model Editing)

Large Language Models (LLMs) have demonstrated remarkable capabilities, but their training requires extensive data and computational resources, rendering them valuable digital assets. Therefore, it is essential to watermark LLMs to protect their copyright and trace unauthorized use or resale. Existing methods for watermarking LLMs primarily rely on training LLMs with a watermarked dataset, which entails burdensome training costs and negatively impacts the LLM's performance. In addition, their watermarked texts are not logical or natural, thereby reducing the stealthiness of the watermark. To address these issues, we propose EditMark, the first watermarking method that leverages model editing to embed a training-free, stealthy, and performance-lossless watermark for LLMs. We observe that some questions have multiple correct answers. Therefore, we assign each answer a unique watermark and update the weights of LLMs to generate corresponding questions and answers through the model editing technique. In addition, we refine the model editing technique to align with the requirements of watermark embedding. Specifically, we introduce an adaptive multi-round stable editing strategy, coupled with the injection of a noise matrix, to improve both the effectiveness and robustness of the watermark embedding. Extensive experiments indicate that EditMark can embed 32-bit watermarks into LLMs within 20 seconds (Fine-tuning: 6875 seconds) with a watermark extraction success rate of 100%, which demonstrates its effectiveness and efficiency. External experiments further demonstrate that EditMark has fidelity, stealthiness, and a certain degree of robustness against common attacks.

翻译：大型语言模型（LLMs）展现出卓越的能力，但其训练需要大量数据和计算资源，使其成为具有重要价值的数字资产。因此，对LLMs进行水印嵌入以保护其版权并追踪未经授权的使用或转售至关重要。现有的LLMs水印方法主要依赖于使用带水印的数据集训练模型，这需要高昂的训练成本，并对LLMs的性能产生负面影响。此外，其生成的水印文本缺乏逻辑性与自然度，从而降低了水印的隐蔽性。为解决这些问题，本文提出EditMark，这是首个利用模型编辑技术为LLMs嵌入无需训练、隐蔽且无损性能的水印方法。我们观察到某些问题存在多个正确答案，因此为每个答案分配唯一水印，并通过模型编辑技术更新LLMs的权重以生成对应的问题-答案对。同时，我们改进模型编辑技术以匹配水印嵌入的需求：具体而言，引入自适应多轮稳定编辑策略，并结合噪声矩阵注入，以提升水印嵌入的有效性与鲁棒性。大量实验表明，EditMark能在20秒内（微调方法需6875秒）为LLMs嵌入32位水印，且水印提取成功率达100%，证明了其高效性与有效性。外部实验进一步验证EditMark在保真度、隐蔽性及对常见攻击的鲁棒性方面均表现优异。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日