Differential privacy (DP) is applied when fine-tuning pre-trained large language models (LLMs) to limit leakage of training examples. While most DP research has focused on improving a model's privacy-utility tradeoff, some find that DP can be unfair to or biased against underrepresented groups. In this work, we show the impact of DP on bias in LLMs through empirical analysis. Differentially private training can increase the model bias against protected groups w.r.t AUC-based bias metrics. DP makes it more difficult for the model to differentiate between the positive and negative examples from the protected groups and other groups in the rest of the population. Our results also show that the impact of DP on bias is not only affected by the privacy protection level but also the underlying distribution of the dataset.
翻译:差分隐私(DP)在微调预训练大语言模型(LLM)时被用于限制训练样本的泄露。尽管大多数DP研究专注于改善模型的隐私-效用权衡,但部分研究发现DP可能对代表性不足的群体不公平或存在偏见。本研究通过实证分析揭示了DP对LLM偏见的影响。差分隐私训练会加剧模型针对受保护群体的偏见(基于AUC的偏见度量)。DP使得模型更难以区分来自受保护群体与总体中其他群体的正负样本。我们的结果还表明,DP对偏见的影响不仅受隐私保护水平的影响,也取决于数据集的基础分布。