Large language models (LLMs) have demonstrated powerful capabilities in both text understanding and generation. Companies have begun to offer Embedding as a Service (EaaS) based on these LLMs, which can benefit various natural language processing (NLP) tasks for customers. However, previous studies have shown that EaaS is vulnerable to model extraction attacks, which can cause significant losses for the owners of LLMs, as training these models is extremely expensive. To protect the copyright of LLMs for EaaS, we propose an Embedding Watermark method called EmbMarker that implants backdoors on embeddings. Our method selects a group of moderate-frequency words from a general text corpus to form a trigger set, then selects a target embedding as the watermark, and inserts it into the embeddings of texts containing trigger words as the backdoor. The weight of insertion is proportional to the number of trigger words included in the text. This allows the watermark backdoor to be effectively transferred to EaaS-stealer's model for copyright verification while minimizing the adverse impact on the original embeddings' utility. Our extensive experiments on various datasets show that our method can effectively protect the copyright of EaaS models without compromising service quality.
翻译:大语言模型(LLMs)在文本理解和生成方面展现了强大的能力。企业已开始基于这些LLMs提供嵌入即服务(EaaS),这可为客户的各种自然语言处理(NLP)任务带来便利。然而,先前研究表明,EaaS易受模型提取攻击,此类攻击可能导致LLMs所有者遭受重大损失,因为训练这些模型的成本极为高昂。为保护EaaS中LLMs的版权,我们提出一种名为EmbMarker的嵌入水印方法,该方法在嵌入中植入后门。我们的方法从通用文本语料库中选取一组中频词构成触发集,再选定目标嵌入作为水印,并将其作为后门插入到包含触发词的文本嵌入中。插入权重与文本中包含的触发词数量成正比。这使得水印后门能够有效迁移至EaaS盗取者的模型,用于版权验证,同时最大程度减少对原始嵌入效用的不利影响。我们在多种数据集上的广泛实验表明,该方法能在不牺牲服务质量的前提下有效保护EaaS模型的版权。