Mitigating social biases typically requires identifying the social groups associated with each data sample. In this paper, we present DAFair, a novel approach to address social bias in language models. Unlike traditional methods that rely on explicit demographic labels, our approach does not require any such information. Instead, we leverage predefined prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias in the model's representations. Our empirical results across two tasks and two models demonstrate the effectiveness of our method compared to previous approaches that do not rely on labeled data. Moreover, with limited demographic-annotated data, our approach outperforms common debiasing approaches.
翻译:缓解社会偏见通常需要识别每个数据样本关联的社会群体。本文提出DAFair,一种新颖的针对语言模型社会偏见的方法。与传统依赖显式人口统计标签的方法不同,我们的方法无需任何此类信息。相反,我们利用预定义的原型人口统计文本,并在微调过程中引入正则化项,以缓解模型表示中的偏见。在两项任务和两种模型上的实证结果表明,相较于此前不依赖标注数据的方法,我们的方法具有有效性。此外,在有限的人口统计标注数据条件下,我们的方法性能优于常见的去偏方法。