Growing literature has shown that powerful NLP systems may encode social biases; however, the political bias of summarization models remains relatively unknown. In this work, we use an entity replacement method to investigate the portrayal of politicians in automatically generated summaries of news articles. We develop a computational framework based on political entities and lexical resources, and use it to assess biases about Donald Trump and Joe Biden in both extractive and abstractive summarization models. We find consistent differences, such as stronger associations of a collective US government (i.e., administration) with Biden than with Trump. These summary dissimilarities are most prominent when the entity is heavily featured in the source article. Our systematic characterization provides a framework for future studies of bias in summarization.
翻译:日益增长的文献表明,强大的自然语言处理系统可能编码社会偏见;然而,摘要模型的政治偏见仍相对未知。在本研究中,我们采用实体替换方法探究新闻文章自动生成摘要中对政治人物的描述方式。我们基于政治实体和词汇资源开发了一个计算框架,并利用该框架评估了抽取式和生成式摘要模型中关于唐纳德·特朗普与乔·拜登的偏见。研究发现存在一致性差异,例如在摘要中,拜登比特朗普更常与美国集体政府(即行政当局)相关联。当源文章中突出提及该实体时,这些摘要差异最为显著。本研究的系统性表征方法为未来摘要偏见研究提供了框架。