Pretrained language models are publicly available and constantly finetuned for various real-life applications. As they become capable of grasping complex contextual information, harmful biases are likely increasingly intertwined with those models. This paper analyses gender bias in BERT models with two main contributions: First, a novel bias measure is introduced, defining biases as the difference in sentiment valuation of female and male sample versions. Second, we comprehensively analyse BERT's biases on the example of a realistic IMDB movie classifier. By systematically varying elements of the training pipeline, we can conclude regarding their impact on the final model bias. Seven different public BERT models in nine training conditions, i.e. 63 models in total, are compared. Almost all conditions yield significant gender biases. Results indicate that reflected biases stem from public BERT models rather than task-specific data, emphasising the weight of responsible usage.
翻译:预训练语言模型已公开可用,并不断针对各类现实应用进行微调。随着这些模型能够捕捉复杂的上下文信息,有害偏见很可能日益与这些模型交织在一起。本文分析BERT模型中的性别偏见,主要贡献有两方面:首先,引入一种新颖的偏见测量方法,将偏见定义为女性与男性样本版本的情感评价差异;其次,我们以现实的IMDB电影分类器为例,全面分析BERT的偏见。通过系统性地改变训练流程中的元素,可推断其对最终模型偏见的影响。我们比较了七种不同的公开BERT模型在九种训练条件下的表现,即总共63个模型。几乎所有条件均产生了显著的性别偏见。结果表明,反映出的偏见源自公开BERT模型,而非任务特定数据,这强调了负责任使用的重要性。