Recent advances in fake news detection have exploited the success of large-scale pre-trained language models (PLMs). The predominant state-of-the-art approaches are based on fine-tuning PLMs on labelled fake news datasets. However, large-scale PLMs are generally not trained on structured factual data and hence may not possess priors that are grounded in factually accurate knowledge. The use of existing knowledge bases (KBs) with rich human-curated factual information has thus the potential to make fake news detection more effective and robust. In this paper, we investigate the impact of knowledge integration into PLMs for fake news detection. We study several state-of-the-art approaches for knowledge integration, mostly using Wikidata as KB, on two popular fake news datasets - LIAR, a politics-based dataset, and COVID-19, a dataset of messages posted on social media relating to the COVID-19 pandemic. Our experiments show that knowledge-enhanced models can significantly improve fake news detection on LIAR where the KB is relevant and up-to-date. The mixed results on COVID-19 highlight the reliance on stylistic features and the importance of domain-specific and current KBs.
翻译:虚假新闻检测的最新进展利用了大规模预训练语言模型(PLMs)的成功。当前最先进的方法主要是在标注的虚假新闻数据集上对PLMs进行微调。然而,大规模PLMs通常未经过结构化事实数据的训练,因此可能不具备基于事实准确知识的先验信息。利用现有知识库(KBs)中丰富的人工整理事实信息,有望使虚假新闻检测更有效、更鲁棒。本文研究了知识集成对PLMs进行虚假新闻检测的影响。我们以维基数据(Wikidata)为知识库,在LIAR(基于政治的虚假新闻数据集)和COVID-19(社交媒体上与新冠疫情相关的消息数据集)这两个常用数据集上,探讨了多种最先进的知识集成方法。实验表明,在知识库相关且时效性强的LIAR数据集上,知识增强模型能显著提升虚假新闻检测效果;而在COVID-19数据集上的混合结果则凸显了对风格特征的依赖以及领域特定和时效性知识库的重要性。