Pretrained language models (PLMs), such as GPT2, have achieved remarkable empirical performance in text generation tasks. However, pretrained on large-scale natural language corpora, the generated text from PLMs may exhibit social bias against disadvantaged demographic groups. To improve the fairness of PLMs in text generation, we propose to minimize the mutual information between the semantics in the generated text sentences and their demographic polarity, i.e., the demographic group to which the sentence is referring. In this way, the mentioning of a demographic group (e.g., male or female) is encouraged to be independent from how it is described in the generated text, thus effectively alleviating the social bias. Moreover, we propose to efficiently estimate the upper bound of the above mutual information via importance sampling, leveraging a natural language corpus. We also propose a distillation mechanism that preserves the language modeling ability of the PLMs after debiasing. Empirical results on real-world benchmarks demonstrate that the proposed method yields superior performance in term of both fairness and language modeling ability.
翻译:预训练语言模型(如GPT2)在文本生成任务中已取得显著经验性能。然而,由于在大规模自然语言语料库上预训练,PLMs生成的文本可能表现出针对弱势人口群体的社会偏见。为提升PLMs在文本生成中的公平性,我们提出最小化生成文本句子中的语义与其人口极性(即句子所指的人口群体)之间的互信息。通过这种方式,鼓励对人口群体(如男性或女性)的提及与生成文本中的描述方式相互独立,从而有效缓解社会偏见。此外,我们提出利用自然语言语料库通过重要性采样高效估计上述互信息的上界。我们还提出一种蒸馏机制,在去偏后保持PLMs的语言建模能力。在实际基准测试上的实验结果表明,所提方法在公平性和语言建模能力方面均表现出优越性能。