Large Language Models (LLMs) shows powerful capability in natural language understanding by capturing hidden semantics in vector space. This process enriches the value of the text embeddings for various downstream tasks, thereby fostering the Embedding-as-a-Service (EaaS) business model. However, the direct transmission of text to servers poses a largely unaddressed risk of privacy leakage. To mitigate this issue, we introduce Split-N-Denoise (SnD), an innovative framework that split the model to execute the token embedding layer on the client side at minimal computational cost. This allows the client to introduce noise prior to transmitting the embeddings to the server, and subsequently receive and denoise the perturbed output embeddings for downstream tasks. Our approach is designed for the inference stage of LLMs and requires no modifications to the model parameters. Extensive experiments demonstrate SnD's effectiveness in optimizing the privacy-utility tradeoff across various LLM architectures and diverse downstream tasks. The results reveal a significant performance improvement under the same privacy budget compared to the baseline, offering clients a privacy-preserving solution for local privacy protection.
翻译:大语言模型通过捕捉向量空间中的隐藏语义展现出强大的自然语言理解能力。这一过程丰富了文本嵌入在不同下游任务中的价值,从而推动了嵌入即服务商业模式的兴起。然而,将文本直接传输至服务器会引发尚未得到充分解决的隐私泄露风险。为缓解该问题,我们提出Split-N-Denoise(SnD)框架,该创新性框架通过分割模型,在客户端以极低计算成本执行词嵌入层。这使得客户端能够在将嵌入传输至服务器前引入噪声,随后接收并恢复经过扰动的输出嵌入用于下游任务。本方法专为大语言模型推理阶段设计,无需修改模型参数。大量实验表明,SnD在优化不同大语言模型架构及多种下游任务的隐私-效用权衡方面具有显著效果。结果显示,在相同隐私预算下,该方法相较基线实现了显著的性能提升,为客户端提供了本地隐私保护的隐私保护方案。