The use of language models (LMs) has increased considerably in recent years, and the biases and stereotypes in training data that are reflected in the LM outputs are causing social problems. In this paper, inspired by the task arithmetic, we propose the ``Bias Vector'' method for the mitigation of these LM biases. The Bias Vector method does not require manually created debiasing data. The three main steps of our approach involve: (1) continual training the pre-trained LMs on biased data using masked language modeling; (2) constructing the Bias Vector as the difference between the weights of the biased LMs and those of pre-trained LMs; and (3) subtracting the Bias Vector from the weights of the pre-trained LMs for debiasing. We evaluated the Bias Vector method on the SEAT across three LMs and confirmed an average improvement of 0.177 points. We demonstrated that the Bias Vector method does not degrade the LM performance on downstream tasks in the GLUE benchmark. In addition, we examined the impact of scaling factors, which control the magnitudes of Bias Vectors, with effect sizes on the SEAT and conducted a comprehensive evaluation of our debiased LMs across both the SEAT and GLUE benchmarks.
翻译:近年来,语言模型(LMs)的使用显著增加,训练数据中存在的偏见和刻板印象反映在语言模型输出中,正引发社会问题。受任务算术的启发,本文提出“偏置向量”方法来缓解这些语言模型偏见。偏置向量方法无需人工创建去偏见数据。我们方法的主要三个步骤包括:(1)使用掩码语言建模在偏见数据上对预训练语言模型进行持续训练;(2)将偏置向量构造为偏见语言模型权重与预训练语言模型权重之间的差值;(3)从预训练语言模型的权重中减去偏置向量以实现去偏见。我们在SEAT基准上针对三种语言模型评估了偏置向量方法,确认其平均提升了0.177分。我们证明了偏置向量方法在GLUE基准的下游任务上不会降低语言模型性能。此外,我们研究了控制偏置向量幅度的缩放因子对SEAT效应量的影响,并在SEAT和GLUE基准上对我们的去偏见语言模型进行了全面评估。