We present a study on sentence-level factuality and bias of news articles across domains. While prior work in NLP has mainly focused on predicting the factuality of article-level news reporting and political-ideological bias of news media, we investigated the effects of framing bias in factual reporting across domains so as to predict factuality and bias at the sentence level, which may explain more accurately the overall reliability of the entire document. First, we manually produced a large sentence-level annotated dataset, titled FactNews, composed of 6,191 sentences from 100 news stories by three different outlets, resulting in 300 news articles. Further, we studied how biased and factual spans surface in news articles from different media outlets and different domains. Lastly, a baseline model for factual sentence prediction was presented by fine-tuning BERT. We also provide a detailed analysis of data demonstrating the reliability of the annotation and models.
翻译:我们提出了一项跨领域新闻文章句子级事实性与偏见的研究。尽管自然语言处理领域的先前工作主要关注预测文章级新闻报道的事实性以及新闻媒体的政治意识形态偏见,但我们研究了跨领域事实报道中框架偏见的影响,从而在句子层面预测事实性与偏见,这可能更准确地解释整个文档的整体可靠性。首先,我们手动构建了一个大规模句子级标注数据集,名为FactNews,包含来自三家不同媒体100篇新闻报道的6191个句子,共计300篇新闻文章。此外,我们研究了偏见的和事实性片段如何在不同媒体和不同领域的新闻文章中呈现。最后,通过微调BERT,我们提出了一个句子级事实性预测的基线模型。我们还提供了详细的数据分析,展示了标注和模型的可靠性。