Social bias in Pretrained Language Models (PLMs) affects text generation and other downstream NLP tasks. Existing bias testing methods rely predominantly on manual templates or on expensive crowd-sourced data. We propose a novel AutoBiasTest method that automatically generates sentences for testing bias in PLMs, hence providing a flexible and low-cost alternative. Our approach uses another PLM for generation and controls the generation of sentences by conditioning on social group and attribute terms. We show that generated sentences are natural and similar to human-produced content in terms of word length and diversity. We illustrate that larger models used for generation produce estimates of social bias with lower variance. We find that our bias scores are well correlated with manual templates, but AutoBiasTest highlights biases not captured by these templates due to more diverse and realistic test sentences. By automating large-scale test sentence generation, we enable better estimation of underlying bias distributions
翻译:预训练语言模型中的社会偏见会影响文本生成及其他下游NLP任务。现有偏见测试方法主要依赖人工模板或昂贵的众包数据。我们提出一种新颖的AutoBiasTest方法,该方法可自动生成句子用于测试预训练语言模型中的偏见,从而提供一种灵活且低成本的替代方案。本方法利用另一个预训练语言模型进行生成,并通过控制社会群体与属性术语的条件来调控句子生成。研究表明,所生成句子在词汇长度和多样性方面自然度较高,与人类生成内容相似。我们证实,用于生成的更大规模模型产生的社会偏见估计值具有更低的方差。同时发现,我们的偏见评分与人工模板具有良好的相关性,但由于AutoBiasTest生成的测试句子更具多样性和现实性,因此能够检测出这些模板未捕捉到的偏见。通过自动化大规模测试句子生成,我们得以更精准地估计潜在的偏见分布。