Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark

Large language models (LLMs) have demonstrated powerful capabilities in both text understanding and generation. Companies have begun to offer Embedding as a Service (EaaS) based on these LLMs, which can benefit various natural language processing (NLP) tasks for customers. However, previous studies have shown that EaaS is vulnerable to model extraction attacks, which can cause significant losses for the owners of LLMs, as training these models is extremely expensive. To protect the copyright of LLMs for EaaS, we propose an Embedding Watermark method called EmbMarker that implants backdoors on embeddings. Our method selects a group of moderate-frequency words from a general text corpus to form a trigger set, then selects a target embedding as the watermark, and inserts it into the embeddings of texts containing trigger words as the backdoor. The weight of insertion is proportional to the number of trigger words included in the text. This allows the watermark backdoor to be effectively transferred to EaaS-stealer's model for copyright verification while minimizing the adverse impact on the original embeddings' utility. Our extensive experiments on various datasets show that our method can effectively protect the copyright of EaaS models without compromising service quality.

翻译：大语言模型（LLMs）在文本理解与生成领域展现出强大能力。企业已开始基于这些模型提供嵌入即服务（EaaS），该服务可惠及客户的各种自然语言处理（NLP）任务。然而，先前研究表明EaaS易受模型提取攻击，这可能导致LLM所有者遭受重大损失，因为训练这些模型的成本极高。为保护用于EaaS的LLM版权，我们提出一种名为EmbMarker的嵌入水印方法，该方法在后门中植入水印。我们的方法从通用文本语料库中选取一组中频词构成触发集，再选择目标嵌入作为水印，并将其作为后门插入包含触发词的文本嵌入中。水印插入权重与文本中包含的触发词数量成正比。这使得水印后门能够有效迁移至EaaS窃取者的模型中用于版权验证，同时最小化对原始嵌入效用的不利影响。我们在多种数据集上的大量实验表明，该方法能在不降低服务质量的前提下有效保护EaaS模型的版权。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/