Prompt Inference Attack on Distributed Large Language Model Inference Frameworks

The inference process of modern large language models (LLMs) demands prohibitive computational resources, rendering them infeasible for deployment on consumer-grade devices. To address this limitation, recent studies propose distributed LLM inference frameworks, which employ split learning principles to enable collaborative LLM inference on resource-constrained hardware. However, distributing LLM layers across participants requires the transmission of intermediate outputs, which may introduce privacy risks to the original input prompts - a critical issue that has yet to be thoroughly explored in the literature. In this paper, we rigorously examine the privacy vulnerabilities of distributed LLM inference frameworks by designing and evaluating three prompt inference attacks aimed at reconstructing input prompts from intermediate LLM outputs. These attacks are developed under various query and data constraints to reflect diverse real-world LLM service scenarios. Specifically, the first attack assumes an unlimited query budget and access to an auxiliary dataset sharing the same distribution as the target prompts. The second attack also leverages unlimited queries but uses an auxiliary dataset with a distribution differing from the target prompts. The third attack operates under the most restrictive scenario, with limited query budgets and no auxiliary dataset available. We evaluate these attacks on a range of LLMs, including state-of-the-art models such as Llama-3.2 and Phi-3.5, as well as widely-used models like GPT-2 and BERT for comparative analysis. Our experiments show that the first two attacks achieve reconstruction accuracies exceeding 90%, while the third achieves accuracies typically above 50%, even under stringent constraints. These findings highlight privacy risks in distributed LLM inference frameworks, issuing a strong alert on their deployment in real-world applications.

翻译：现代大语言模型（LLMs）的推理过程需要极高的计算资源，导致其无法在消费级设备上部署。为应对这一限制，近期研究提出了分布式LLM推理框架，其采用分割学习原理，使得资源受限的硬件能够协作完成LLM推理。然而，将LLM层分布到多个参与者之间需要传输中间输出，这可能对原始输入提示引入隐私风险——这一关键问题在现有文献中尚未得到深入探讨。本文通过设计与评估三种旨在从LLM中间输出重建输入提示的提示推断攻击，系统性地研究了分布式LLM推理框架的隐私脆弱性。这些攻击在不同查询与数据约束下开发，以反映多样化的现实世界LLM服务场景。具体而言，第一种攻击假设拥有无限查询预算，并能够访问与目标提示同分布的辅助数据集。第二种攻击同样利用无限查询，但使用的辅助数据集分布与目标提示不同。第三种攻击在最具限制性的场景下运行，即查询预算有限且无可用辅助数据集。我们在多种LLM上评估了这些攻击，包括Llama-3.2和Phi-3.5等前沿模型，以及GPT-2和BERT等广泛使用的模型以进行对比分析。实验表明，前两种攻击的重建准确率超过90%，而第三种攻击即使在严格约束下，准确率通常也能达到50%以上。这些发现揭示了分布式LLM推理框架中存在的隐私风险，为其在现实应用中的部署发出了强烈警示。