Large language models (LLMs) have demonstrated powerful capabilities in both text understanding and generation. Companies have begun to offer Embedding as a Service (EaaS) based on these LLMs, which can benefit various natural language processing (NLP) tasks for customers. However, previous studies have shown that EaaS is vulnerable to model extraction attacks, which can cause significant losses for the owners of LLMs, as training these models is extremely expensive. To protect the copyright of LLMs for EaaS, we propose an Embedding Watermark method called EmbMarker that implants backdoors on embeddings. Our method selects a group of moderate-frequency words from a general text corpus to form a trigger set, then selects a target embedding as the watermark, and inserts it into the embeddings of texts containing trigger words as the backdoor. The weight of insertion is proportional to the number of trigger words included in the text. This allows the watermark backdoor to be effectively transferred to EaaS-stealer's model for copyright verification while minimizing the adverse impact on the original embeddings' utility. Our extensive experiments on various datasets show that our method can effectively protect the copyright of EaaS models without compromising service quality.
翻译:大语言模型(LLMs)在文本理解与生成领域展现出强大能力。企业已开始基于这些模型提供嵌入即服务(EaaS),该服务可惠及客户的各种自然语言处理(NLP)任务。然而,先前研究表明EaaS易受模型提取攻击,这可能导致LLM所有者遭受重大损失,因为训练这些模型的成本极高。为保护用于EaaS的LLM版权,我们提出一种名为EmbMarker的嵌入水印方法,该方法在后门中植入水印。我们的方法从通用文本语料库中选取一组中频词构成触发集,再选择目标嵌入作为水印,并将其作为后门插入包含触发词的文本嵌入中。水印插入权重与文本中包含的触发词数量成正比。这使得水印后门能够有效迁移至EaaS窃取者的模型中用于版权验证,同时最小化对原始嵌入效用的不利影响。我们在多种数据集上的大量实验表明,该方法能在不降低服务质量的前提下有效保护EaaS模型的版权。