In the wake of the burgeoning expansion of generative artificial intelligence (AI) services, the computational demands inherent to these technologies frequently necessitate cloud-powered computational offloading, particularly for resource-constrained mobile devices. These services commonly employ prompts to steer the generative process, and both the prompts and the resultant content, such as text and images, may harbor privacy-sensitive or confidential information, thereby elevating security and privacy risks. To mitigate these concerns, we introduce $\Lambda$-Split, a split computing framework to facilitate computational offloading while simultaneously fortifying data privacy against risks such as eavesdropping and unauthorized access. In $\Lambda$-Split, a generative model, usually a deep neural network (DNN), is partitioned into three sub-models and distributed across the user's local device and a cloud server: the input-side and output-side sub-models are allocated to the local, while the intermediate, computationally-intensive sub-model resides on the cloud server. This architecture ensures that only the hidden layer outputs are transmitted, thereby preventing the external transmission of privacy-sensitive raw input and output data. Given the black-box nature of DNNs, estimating the original input or output from intercepted hidden layer outputs poses a significant challenge for malicious eavesdroppers. Moreover, $\Lambda$-Split is orthogonal to traditional encryption-based security mechanisms, offering enhanced security when deployed in conjunction. We empirically validate the efficacy of the $\Lambda$-Split framework using Llama 2 and Stable Diffusion XL, representative large language and diffusion models developed by Meta and Stability AI, respectively. Our $\Lambda$-Split implementation is publicly accessible at https://github.com/nishio-laboratory/lambda_split.
翻译:在生成式人工智能服务蓬勃发展的背景下,此类技术固有的计算需求(尤其是资源受限的移动设备)通常需要借助云端的计算卸载。这些服务通常采用提示词来引导生成过程,而提示词及生成内容(如文本和图像)可能包含隐私敏感或机密信息,从而增加了安全与隐私风险。为缓解这些问题,我们提出 $Λ$-Split——一种分割计算框架,在促进计算卸载的同时,强化数据隐私以抵御窃听、未授权访问等风险。在 $Λ$-Split 中,一个生成模型(通常为深度神经网络DNN)被划分为三个子模型,并分布部署在用户本地设备与云端服务器上:输入侧和输出侧子模型部署于本地,而中间计算密集型子模型则驻留于云端服务器。这种架构确保仅传输隐藏层输出,从而防止隐私敏感的原始输入输出数据被外部传输。鉴于深度神经网络的“黑箱”特性,恶意窃听者难以从截获的隐藏层输出中还原原始输入或输出。此外,$Λ$-Split 与传统的基于加密的安全机制正交,可在联合部署时提供增强的安全性。我们使用 Llama 2 和 Stable Diffusion XL(分别由 Meta 与 Stability AI 开发的代表性大语言模型和扩散模型)进行了实证验证,证明了 $Λ$-Split 框架的有效性。我们的 $Λ$-Split 实现已开源,访问地址为 https://github.com/nishio-laboratory/lambda_split。