Our work tackles the challenge of securing user inputs in cloud-hosted large language model (LLM) serving while ensuring output invariance, model confidentiality, and compute efficiency. We introduce secure multi-party decoding (SMD), which leverages confidential computing to confine user prompts to a trusted execution environment (TEE), namely a confidential virtual machine (CVM), while allowing service providers to generate tokens efficiently. We also introduce a novel cryptographic method, prompt obfuscation (PO), to ensure robustness against reconstruction attacks on SMD. We demonstrate that our approach preserves both prompt confidentiality and LLM serving efficiency. Our solution can enable privacy-preserving cloud LLM serving that handles sensitive prompts, such as clinical records, financial data, and personal information.
翻译:本研究致力于解决在云托管大型语言模型(LLM)服务中保护用户输入安全的挑战,同时确保输出不变性、模型机密性和计算效率。我们引入了安全多方解码(SMD),该方法利用机密计算将用户提示限制在可信执行环境(TEE)中,即机密虚拟机(CVM),同时允许服务提供商高效地生成令牌。我们还提出了一种新颖的密码学方法——提示混淆(PO),以确保SMD对重建攻击的鲁棒性。我们证明了我们的方法既保护了提示的机密性,又保持了LLM服务效率。我们的解决方案能够实现隐私保护的云LLM服务,以处理敏感提示,如临床记录、财务数据和个人信息。