Collaborative edge-cloud inference enables resource-constrained devices to leverage large language models (LLMs) by offloading partial computation to cloud servers. However, transmitting intermediate activations exposes sensitive user prompts to prompt inversion attacks, where an adversary reconstructs the original input from shared representations. Existing defenses rely largely on heuristic perturbations or empirical tuning, offering limited theoretical understanding of privacy leakage and its interaction with utility and latency constraints. We propose an information-theoretic defense framework for prompt inversion in collaborative LLM inference. Our approach learns privacy-preserving representations by explicitly minimizing the mutual information between intermediate activations and the input prompt while maintaining task utility under computational constraints. We derive theoretical guarantees on prompt reconstruction error, characterize fundamental privacy-utility tradeoffs, and establish token-level accuracy bounds for downstream inference. We then propose a novel defense based on privacy adapters implemented via low-dimensional information bottlenecks. Extensive experiments across multiple settings demonstrate that our method achieves superior privacy-utility-latency tradeoffs compared to existing defenses (up to 35% reduction in attack success), providing a principled foundation for private and efficient collaborative LLM inference.
翻译:协作边云推理通过将部分计算卸载至云端服务器,使资源受限设备能够利用大语言模型。然而,传输中间层激活值会向提示反转攻击暴露敏感的用户提示,攻击者能从共享表征中重构原始输入。现有防御主要依赖启发式扰动或经验调参,对隐私泄露及其与效用、延迟约束的相互影响缺乏理论理解。我们提出了一种面向协作大语言模型推理中提示反转攻击的信息论防御框架。该方法通过显式最小化中间层激活值与输入提示之间的互信息来学习隐私保护表征,同时保持计算约束下的任务效用。我们推导了提示重构误差的理论保证,刻画了基础隐私-效用的权衡关系,并为下游推理建立了词元级准确率边界。随后基于低维信息瓶颈的隐私适配器提出新型防御机制。跨多场景的大量实验表明,相较现有防御,我们的方法实现了更优的隐私-效用-延迟权衡(攻击成功率降低高达35%),为高效且隐私保护的协作大语言模型推理提供了理论基础。