Discriminatory gender biases have been found in Pre-trained Language Models (PLMs) for multiple languages. In Natural Language Inference (NLI), existing bias evaluation methods have focused on the prediction results of a specific label out of three labels, such as neutral. However, such evaluation methods can be inaccurate since unique biased inferences are associated with unique prediction labels. Addressing this limitation, we propose a bias evaluation method for PLMs that considers all the three labels of NLI task. We create three evaluation data groups that represent different types of biases. Then, we define a bias measure based on the corresponding label output of each data group. In the experiments, we introduce a meta-evaluation technique for NLI bias measures and use it to confirm that our bias measure can distinguish biased, incorrect inferences from non-biased incorrect inferences better than the baseline, resulting in a more accurate bias evaluation. As we create the datasets in English, Japanese, and Chinese, we also validate the compatibility of our bias measure across multiple languages. Lastly, we observe the bias tendencies in PLMs of each language. To our knowledge, we are the first to construct evaluation datasets and measure PLMs' bias from NLI in Japanese and Chinese.
翻译:歧视性性别偏见已在多种语言的预训练语言模型中被发现。在自然语言推理任务中,现有偏见评估方法仅关注三个标签中特定标签(如中性)的预测结果。然而,此类评估方法可能不准确,因为独特的偏见推断与特定的预测标签相关联。针对这一局限性,我们提出了一种考虑自然语言推理所有三个标签的预训练语言模型偏见评估方法。我们构建了代表不同类型偏见的三个评估数据组,然后根据每个数据组对应的标签输出定义了偏见度量指标。在实验中,我们引入了一种针对自然语言推理偏见度量指标的元评估技术,并验证了我们的偏见度量指标相较于基线方法能更好地区分偏见性错误推断与非偏见性错误推断,从而实现更准确的偏见评估。由于我们创建了英语、日语和中文的数据集,还验证了该偏见度量指标在多种语言间的兼容性。最后,我们观察到各语言预训练语言模型中的偏见倾向。据我们所知,这是首次针对日语和中文构建评估数据集并从自然语言推理角度衡量预训练语言模型的偏见。