Social and political scientists often aim to discover and measure distinct biases from text data representations (embeddings). Innovative transformer-based language models produce contextually-aware token embeddings and have achieved state-of-the-art performance for a variety of natural language tasks, but have been shown to encode unwanted biases for downstream applications. In this paper, we evaluate the social biases encoded by transformers trained with the masked language modeling objective using proposed proxy functions within an iterative masking experiment to measure the quality of transformer models' predictions, and assess the preference of MLMs towards disadvantaged and advantaged groups. We compare bias estimations with those produced by other evaluation methods using two benchmark datasets, finding relatively high religious and disability biases across considered MLMs and low gender bias in one dataset relative to the other. Our measures outperform others in their agreement with human annotators. We extend on previous work by evaluating social biases introduced after re-training an MLM under the masked language modeling objective (w.r.t. the model's pre-trained base), and find that proposed measures produce more accurate estimations of relative preference for biased sentences between transformers than others based on our methods.
翻译:社会与政治科学家常试图从文本数据表征(嵌入)中发现并衡量不同的偏见。创新性的基于Transformer的语言模型能够生成上下文感知的词元嵌入,并在多种自然语言任务中取得了最先进的表现,但研究表明它们在下游应用中编码了不受欢迎的偏见。本文通过迭代遮蔽实验,利用提出的代理函数评估以遮蔽语言建模目标训练的Transformer模型预测质量,进而衡量MLM对弱势群体与优势群体的偏好。我们使用两个基准数据集,将偏见估计结果与其他评估方法进行对比,发现所考虑的MLM在宗教和残疾偏见方面相对较高,而其中一个数据集的性别偏见低于另一个数据集。我们的测量方法在与人标注者的一致性上优于其他方法。我们拓展了先前工作,评估了在遮蔽语言建模目标下重新训练MLM(相对于模型预训练基座)后引入的社会偏见,并发现基于我们的方法提出的测量指标能比其他方法更准确地估计Transformer对偏见句子的相对偏好。