Fine-tuning large language models on downstream tasks is crucial for realizing their cross-domain potential but often relies on sensitive data, raising privacy concerns. Differential privacy (DP) offers rigorous privacy guarantees and has been widely adopted in fine-tuning; however, naively injecting noise across the high-dimensional parameter space creates perturbations with large norms, degrading performance and destabilizing training. To address this issue, we propose DP-SFT, a two-stage subspace fine-tuning method that substantially reduces noise magnitude while preserving formal DP guarantees. Our intuition is that, during fine-tuning, significant parameter updates lie within a low-dimensional, task-specific subspace, while other directions change minimally. Hence, we only inject DP noise into this subspace to protect privacy without perturbing irrelevant parameters. In phase one, we identify the subspace by analyzing principal gradient directions to capture task-specific update signals. In phase two, we project full gradients onto this subspace, add DP noise, and map the perturbed gradients back to the original parameter space for model updates, markedly lowering noise impact. Experiments on multiple datasets demonstrate that DP-SFT enhances accuracy and stability under rigorous DP constraints, accelerates convergence, and achieves substantial gains over DP fine-tuning baselines.
翻译:在下游任务上微调大型语言模型对于实现其跨领域潜力至关重要,但通常依赖于敏感数据,从而引发隐私担忧。差分隐私(DP)提供了严格的隐私保证,并已广泛应用于微调过程;然而,在高维参数空间中直接注入噪声会产生范数较大的扰动,导致性能下降并破坏训练稳定性。为解决此问题,我们提出DP-SFT,一种两阶段子空间微调方法,该方法在保持形式化DP保证的同时,显著降低了噪声幅度。我们的直觉是,在微调过程中,显著的参数更新位于一个低维的、任务特定的子空间内,而其他方向的变化极小。因此,我们仅在此子空间内注入DP噪声以保护隐私,而不扰动无关参数。在第一阶段,我们通过分析主梯度方向来识别该子空间,以捕获任务特定的更新信号。在第二阶段,我们将完整梯度投影到此子空间,添加DP噪声,并将扰动后的梯度映射回原始参数空间以进行模型更新,从而显著降低了噪声影响。在多个数据集上的实验表明,DP-SFT在严格的DP约束下提高了准确性和稳定性,加速了收敛,并相较于DP微调基线取得了显著增益。