The threat that online fake news and misinformation pose to democracy, justice, public confidence, and especially to vulnerable populations, has led to a sharp increase in the need for fake news detection and intervention. Whether multi-modal or pure text-based, most fake news detection methods depend on textual analysis of entire articles. However, these fake news detection methods come with certain limitations. For instance, fake news detection methods that rely on full text can be computationally inefficient, demand large amounts of training data to achieve competitive accuracy, and may lack robustness across different datasets. This is because fake news datasets have strong variations in terms of the level and types of information they provide; where some can include large paragraphs of text with images and metadata, others can be a few short sentences. Perhaps if one could only use minimal information to detect fake news, fake news detection methods could become more robust and resilient to the lack of information. We aim to overcome these limitations by detecting fake news using systematically selected, limited information that is both effective and capable of delivering robust, promising performance. We propose a framework called SLIM Systematically-selected Limited Information) for fake news detection. In SLIM, we quantify the amount of information by introducing information-theoretic measures. SLIM leverages limited information to achieve performance in fake news detection comparable to that of state-of-the-art obtained using the full text. Furthermore, by combining various types of limited information, SLIM can perform even better while significantly reducing the quantity of information required for training compared to state-of-the-art language model-based fake news detection techniques.
翻译:在线虚假新闻与错误信息对民主、司法、公众信任,特别是对弱势群体构成的威胁,使得对虚假新闻检测与干预的需求急剧增长。无论是多模态还是纯文本方法,大多数虚假新闻检测技术都依赖于对完整文章的文本分析。然而,这些方法存在一定的局限性。例如,依赖全文的虚假新闻检测方法可能存在计算效率低下、需要大量训练数据才能达到有竞争力的准确率,且在不同数据集间缺乏鲁棒性等问题。这是因为虚假新闻数据集在信息层级与类型上存在显著差异:有些可能包含大量文本段落并配有图像和元数据,而另一些可能仅由寥寥数句短文构成。倘若仅使用极少信息就能检测虚假新闻,或许能使检测方法在面对信息缺失时更具鲁棒性与适应性。我们旨在通过系统性地选取有限但有效的信息进行虚假新闻检测,以克服这些局限,实现稳健且性能优异的检测效果。我们提出了名为SLIM(系统化有限信息选择)的虚假新闻检测框架。在SLIM中,我们通过引入信息论度量来量化信息量。该框架利用有限信息实现了与基于全文的先进方法相媲美的检测性能。此外,通过组合多种类型的有限信息,SLIM能取得更优的表现,同时相较于基于先进语言模型的虚假新闻检测技术,其训练所需的信息量显著减少。