Information Leakage from Embedding in Large Language Models

The widespread adoption of large language models (LLMs) has raised concerns regarding data privacy. This study aims to investigate the potential for privacy invasion through input reconstruction attacks, in which a malicious model provider could potentially recover user inputs from embeddings. We first propose two base methods to reconstruct original texts from a model's hidden states. We find that these two methods are effective in attacking the embeddings from shallow layers, but their effectiveness decreases when attacking embeddings from deeper layers. To address this issue, we then present Embed Parrot, a Transformer-based method, to reconstruct input from embeddings in deep layers. Our analysis reveals that Embed Parrot effectively reconstructs original inputs from the hidden states of ChatGLM-6B and Llama2-7B, showcasing stable performance across various token lengths and data distributions. To mitigate the risk of privacy breaches, we introduce a defense mechanism to deter exploitation of the embedding reconstruction process. Our findings emphasize the importance of safeguarding user privacy in distributed learning systems and contribute valuable insights to enhance the security protocols within such environments.

翻译：大规模语言模型（LLM）的广泛应用引发了数据隐私方面的担忧。本研究旨在探究通过输入重构攻击侵犯隐私的可能性，即恶意模型提供者可能从嵌入中恢复用户输入。我们首先提出了两种从模型隐藏状态重构原始文本的基础方法。研究发现，这两种方法在攻击浅层嵌入时效果显著，但在攻击深层嵌入时有效性下降。为解决这一问题，我们进一步提出了基于Transformer的方法Embed Parrot，用于从深层嵌入中重构输入。分析表明，Embed Parrot能够有效从ChatGLM-6B和Llama2-7B的隐藏状态中重构原始输入，在不同token长度和数据分布下均展现出稳定的性能。为降低隐私泄露风险，我们引入了一种防御机制来阻止对嵌入重构过程的利用。我们的研究结果强调了在分布式学习系统中保护用户隐私的重要性，并为增强此类环境下的安全协议提供了宝贵见解。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/