Mitigating social biases typically requires identifying the social groups associated with each data sample. In this paper, we present DAFair, a novel approach to address social bias in language models. Unlike traditional methods that rely on explicit demographic labels, our approach does not require any such information. Instead, we leverage predefined prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias in the model's representations. Our empirical results across two tasks and two models demonstrate the effectiveness of our method compared to previous approaches that do not rely on labeled data. Moreover, with limited demographic-annotated data, our approach outperforms common debiasing approaches.
翻译:社会偏见的缓解通常需要识别每个数据样本所属的社会群体。本文提出了一种新颖方法DAFair,旨在解决语言模型中的社会偏见。与传统方法依赖显式人口统计标签不同,我们的方法无需任何此类信息。我们利用预定义的典型人口统计文本,在微调过程中引入正则化项,以减轻模型表征中的偏差。在两个任务和两个模型上的实证结果表明,与先前不依赖标注数据的方法相比,我们的方法更具有效性。此外,在有限的人口统计标注数据条件下,我们的方法优于常见的去偏方法。