$Λ$-Split: A Privacy-Preserving Split Computing Framework for Cloud-Powered Generative AI

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

In the wake of the burgeoning expansion of generative artificial intelligence (AI) services, the computational demands inherent to these technologies frequently necessitate cloud-powered computational offloading, particularly for resource-constrained mobile devices. These services commonly employ prompts to steer the generative process, and both the prompts and the resultant content, such as text and images, may harbor privacy-sensitive or confidential information, thereby elevating security and privacy risks. To mitigate these concerns, we introduce $\Lambda$-Split, a split computing framework to facilitate computational offloading while simultaneously fortifying data privacy against risks such as eavesdropping and unauthorized access. In $\Lambda$-Split, a generative model, usually a deep neural network (DNN), is partitioned into three sub-models and distributed across the user's local device and a cloud server: the input-side and output-side sub-models are allocated to the local, while the intermediate, computationally-intensive sub-model resides on the cloud server. This architecture ensures that only the hidden layer outputs are transmitted, thereby preventing the external transmission of privacy-sensitive raw input and output data. Given the black-box nature of DNNs, estimating the original input or output from intercepted hidden layer outputs poses a significant challenge for malicious eavesdroppers. Moreover, $\Lambda$-Split is orthogonal to traditional encryption-based security mechanisms, offering enhanced security when deployed in conjunction. We empirically validate the efficacy of the $\Lambda$-Split framework using Llama 2 and Stable Diffusion XL, representative large language and diffusion models developed by Meta and Stability AI, respectively. Our $\Lambda$-Split implementation is publicly accessible at https://github.com/nishio-laboratory/lambda_split.

翻译：在生成式人工智能服务蓬勃发展的背景下，此类技术固有的计算需求（尤其是资源受限的移动设备）通常需要借助云端的计算卸载。这些服务通常采用提示词来引导生成过程，而提示词及生成内容（如文本和图像）可能包含隐私敏感或机密信息，从而增加了安全与隐私风险。为缓解这些问题，我们提出 $Λ$-Split——一种分割计算框架，在促进计算卸载的同时，强化数据隐私以抵御窃听、未授权访问等风险。在 $Λ$-Split 中，一个生成模型（通常为深度神经网络DNN）被划分为三个子模型，并分布部署在用户本地设备与云端服务器上：输入侧和输出侧子模型部署于本地，而中间计算密集型子模型则驻留于云端服务器。这种架构确保仅传输隐藏层输出，从而防止隐私敏感的原始输入输出数据被外部传输。鉴于深度神经网络的“黑箱”特性，恶意窃听者难以从截获的隐藏层输出中还原原始输入或输出。此外，$Λ$-Split 与传统的基于加密的安全机制正交，可在联合部署时提供增强的安全性。我们使用 Llama 2 和 Stable Diffusion XL（分别由 Meta 与 Stability AI 开发的代表性大语言模型和扩散模型）进行了实证验证，证明了 $Λ$-Split 框架的有效性。我们的 $Λ$-Split 实现已开源，访问地址为 https://github.com/nishio-laboratory/lambda_split。