Generative AI applications such as personal AI agents, image generators, and chat assistants offer advanced capabilities to improve user experience. Behind the scenes, Large Language Models (LLMs) that power these services require a massive amount of computation and are usually deployed in the cloud, available as APIs, meaning that a user's request has to be sent to a Cloud Inference Service (CIS) for processing. However, the strong capabilities of LLM also mean that user's requests now contain much more personal sensitive or enterprise confidential information, demanding equally strong protection in CIS. While early industry efforts such as Apple Private Cloud Compute (PCC) and Google Private AI Compute have emerged to show the potential of secure CIS, they are not adoptable for deployment by others due to their reliance on proprietary hardware and closed ecosystem. In addition, they all suffer from their own design glitches that can undermine the ambitious goal of bringing in true privacy protection to end users. In this paper, we present our analysis of the fundamental requirements of building a secure yet open CIS. We then present OpenPCC, a Confidential CIS framework that does not rely on proprietary hardware but instead uses commercially available TEEs. We implement an open-source prototype and characterize it end-to-end on a Llama-3 8B vLLM workload, separating OpenPCC's own cost from the underlying TEE hardware. Our analysis and evaluation demonstrated the feasibility and security of the system.
翻译:生成式人工智能应用,如个人AI代理、图像生成器和聊天助手,提供了先进功能以提升用户体验。在幕后,为这些服务提供动力的大语言模型需要海量计算资源,通常部署在云端并通过API开放使用,这意味着用户请求必须发送至云端推理服务进行处理。然而,大语言模型的强大能力也意味着用户请求现在包含更多个人敏感或企业机密信息,这要求云端推理服务具备同等强度的保护。尽管早期的行业努力(如Apple Private Cloud Compute和Google Private AI Compute)已展现出安全云端推理服务的潜力,但由于其依赖专有硬件和封闭生态系统,他人难以采纳部署。此外,它们均存在自身设计缺陷,可能损害为用户带来真正隐私保护的宏大目标。本文分析了构建安全且开放的云端推理服务的基本需求,进而提出了OpenPCC——一种不依赖专有硬件,而使用商用可信执行环境的机密云端推理服务框架。我们实现了一个开源原型,并在Llama-3 8B vLLM工作负载上对其进行了端到端的性能评估,将OpenPCC自身开销与底层TEE硬件开销区分开。分析评估证实了该系统的可行性与安全性。