Many measures of societal bias in language models have been proposed in recent years. A popular approach is to use a set of word filling prompts to evaluate the behavior of the language models. In this work, we analyze the validity of two such measures -- StereoSet and CrowS-Pairs. We show that these measures produce unexpected and illogical results when appropriate control group samples are constructed. Based on this, we believe that they are problematic and using them in the future should be reconsidered. We propose a way forward with an improved testing protocol. Finally, we also introduce a new gender bias dataset for Slovak.
翻译:近年来,语言模型的社会偏见度量方法层出不穷。一种常见做法是使用一组词汇填充提示来评估语言模型的行为。本研究分析了两种此类度量方法——StereoSet与CrowS-Pairs的有效性。我们发现,当构造适当的对照组样本时,这些度量方法会产生出人意料且不合逻辑的结果。基于此,我们认为这些方法存在问题,未来应重新考虑其使用。我们提出了一种改进的测试方案作为前进方向。最后,我们还引入了针对斯洛伐克语的新性别偏见数据集。