The widespread adoption of large language models (LLMs) has raised concerns regarding data privacy. This study aims to investigate the potential for privacy invasion through input reconstruction attacks, in which a malicious model provider could potentially recover user inputs from embeddings. We first propose two base methods to reconstruct original texts from a model's hidden states. We find that these two methods are effective in attacking the embeddings from shallow layers, but their effectiveness decreases when attacking embeddings from deeper layers. To address this issue, we then present Embed Parrot, a Transformer-based method, to reconstruct input from embeddings in deep layers. Our analysis reveals that Embed Parrot effectively reconstructs original inputs from the hidden states of ChatGLM-6B and Llama2-7B, showcasing stable performance across various token lengths and data distributions. To mitigate the risk of privacy breaches, we introduce a defense mechanism to deter exploitation of the embedding reconstruction process. Our findings emphasize the importance of safeguarding user privacy in distributed learning systems and contribute valuable insights to enhance the security protocols within such environments.
翻译:大规模语言模型(LLM)的广泛应用引发了数据隐私方面的担忧。本研究旨在探究通过输入重构攻击侵犯隐私的可能性,即恶意模型提供者可能从嵌入中恢复用户输入。我们首先提出了两种从模型隐藏状态重构原始文本的基础方法。研究发现,这两种方法在攻击浅层嵌入时效果显著,但在攻击深层嵌入时有效性下降。为解决这一问题,我们进一步提出了基于Transformer的方法Embed Parrot,用于从深层嵌入中重构输入。分析表明,Embed Parrot能够有效从ChatGLM-6B和Llama2-7B的隐藏状态中重构原始输入,在不同token长度和数据分布下均展现出稳定的性能。为降低隐私泄露风险,我们引入了一种防御机制来阻止对嵌入重构过程的利用。我们的研究结果强调了在分布式学习系统中保护用户隐私的重要性,并为增强此类环境下的安全协议提供了宝贵见解。