Differentially private stochastic gradient descent (DP-SGD) adds noise to gradients in back-propagation, safeguarding training data from privacy leakage, particularly membership inference. It fails to cover (inference-time) threats like embedding inversion and sensitive attribute inference. It is also costly in storage and computation when used to fine-tune large pre-trained language models (LMs). We propose DP-Forward, which directly perturbs embedding matrices in the forward pass of LMs. It satisfies stringent local DP requirements for training and inference data. To instantiate it using the smallest matrix-valued noise, we devise an analytic matrix Gaussian~mechanism (aMGM) by drawing possibly non-i.i.d. noise from a matrix Gaussian distribution. We then investigate perturbing outputs from different hidden (sub-)layers of LMs with aMGM noises. Its utility on three typical tasks almost hits the non-private baseline and outperforms DP-SGD by up to 7.7pp at a moderate privacy level. It saves 3$\times$ time and memory costs compared to DP-SGD with the latest high-speed library. It also reduces the average success rates of embedding inversion and sensitive attribute inference by up to 88pp and 41pp, respectively, whereas DP-SGD fails.
翻译:差分隐私随机梯度下降(DP-SGD)通过在反向传播中对梯度添加噪声,保护训练数据免受隐私泄露(尤其是成员推断攻击),但无法防范嵌入反演和敏感属性推断等推理时威胁。此外,用于微调大型预训练语言模型时,该方案在存储与计算方面成本高昂。本文提出DP-Forward方法,该方法直接在语言模型的前向传播中对嵌入矩阵添加扰动,满足训练与推理数据严格的本地差分隐私要求。为实现最小矩阵型噪声的实例化,我们设计了分析型矩阵高斯机制(aMGM),从矩阵高斯分布中抽取可能非独立同分布的噪声。进而研究了使用aMGM噪声扰动不同隐藏层(子层)输出的效果。在三个典型任务中,该方法的效用几乎达到非隐私基线水平,且在中等隐私保护强度下性能优于DP-SGD达7.7个百分点。与使用最新高速库的DP-SGD相比,该方法节省了3倍的时间和内存成本。此外,在DP-SGD失效的场景下,该方法将嵌入反演和敏感属性推断的平均成功率分别降低了88个百分点和41个百分点。