With the rise of large language models, service providers offer language models as a service, enabling users to fine-tune customized models via uploaded private datasets. However, this raises concerns about sensitive data leakage. Prior methods, relying on differential privacy within device-cloud collaboration frameworks, struggle to balance privacy and utility, exposing users to inference attacks or degrading fine-tuning performance. To address this, we propose PrivTune, an efficient and privacy-preserving fine-tuning framework via Split Learning (SL). The key idea of PrivTune is to inject crafted noise into token representations from the SL bottom model, making each token resemble the $n$-hop indirect neighbors. PrivTune formulates this as an optimization problem to compute the optimal noise vector, aligning with defense-utility goals. On this basis, it then adjusts the parameters (i.e., mean) of the $d_χ$-Privacy noise distribution to align with the optimization direction and scales the noise according to token importance to minimize distortion. Experiments on five datasets (covering both classification and generation tasks) against three embedding inversion and three attribute inference attacks show that, using RoBERTa on the Stanford Sentiment Treebank dataset, PrivTune reduces the attack success rate to 10% with only a 3.33% drop in utility performance, outperforming state-of-the-art baselines.
翻译:随着大语言模型的兴起,服务提供商提供语言模型即服务,允许用户通过上传私有数据集来微调定制化模型。然而,这引发了敏感数据泄露的担忧。现有方法依赖于端云协作框架内的差分隐私技术,难以在隐私保护与模型效用之间取得平衡,导致用户面临推理攻击或微调性能下降的问题。为解决此问题,我们提出PrivTune,一种基于分割学习的高效隐私保护微调框架。PrivTune的核心思想是在分割学习底层模型输出的词元表示中注入精心设计的噪声,使每个词元表征趋近于其$n$跳间接邻居。PrivTune将此过程构建为优化问题,以计算符合防御-效用目标的最优噪声向量。在此基础上,通过调整$d_χ$-隐私噪声分布的参数(即均值)以对齐优化方向,并根据词元重要性缩放噪声以最小化表征失真。在五个数据集(涵盖分类与生成任务)上针对三类嵌入反演攻击和三类属性推理攻击的实验表明,在斯坦福情感树库数据集上使用RoBERTa模型时,PrivTune能将攻击成功率降至10%,同时仅造成3.33%的效用性能下降,其综合表现优于现有基线方法。