Masked Language Models (MLMs) have been successful in many natural language processing tasks. However, real-world stereotype biases are likely to be reflected in MLMs due to their learning from large text corpora. Most of the evaluation metrics proposed in the past adopt different masking strategies, designed with the log-likelihood of MLMs. They lack holistic considerations such as variance for stereotype bias and anti-stereotype bias samples. In this paper, the log-likelihoods of stereotype bias and anti-stereotype bias samples output by MLMs are considered Gaussian distributions. Two evaluation metrics, Kullback Leibler Divergence Score (KLDivS) and Jensen Shannon Divergence Score (JSDivS) are proposed to evaluate social biases in MLMs The experimental results on the public datasets StereoSet and CrowS-Pairs demonstrate that KLDivS and JSDivS are more stable and interpretable compared to the metrics proposed in the past.
翻译:掩码语言模型(MLMs)已在众多自然语言处理任务中取得成功。然而,由于从大规模文本语料库中学习,现实世界中的刻板印象偏见可能反映在MLMs中。过往提出的大多数评估指标采用不同的掩码策略,基于MLMs的对数似然设计,缺乏对刻板印象偏见和反刻板印象偏见样本方差等整体性考量。本文将MLMs输出的刻板印象偏见样本和反刻板印象偏见样本的对数似然视为高斯分布,提出了两种评估指标——KL散度分数(KLDivS)和JS散度分数(JSDivS),用于评估MLMs中的社会偏见。在公开数据集StereoSet和CrowS-Pairs上的实验结果表明,与过往提出的指标相比,KLDivS和JSDivS更具稳定性和可解释性。