In recent years, fake news detection has received increasing attention in public debate and scientific research. Despite advances in detection techniques, the production and spread of false information have become more sophisticated, driven by Large Language Models (LLMs) and the amplification power of social media. We present a critical assessment of 12 representative fake news detection approaches, spanning traditional machine learning, deep learning, transformers, and specialized cross-domain architectures. We evaluate these methods on 10 publicly available datasets differing in genre, source, topic, and labeling rationale. We address text-only English fake news detection as a binary classification task by harmonizing labels into "Real" and "Fake" to ensure a consistent evaluation protocol. We acknowledge that label semantics vary across datasets and that harmonization inevitably removes such semantic nuances. Each dataset is treated as a distinct domain. We conduct in-domain, multi-domain and cross-domain experiments to simulate real-world scenarios involving domain shift and out-of-distribution data. Fine-tuned models perform well in-domain but struggle to generalize. Cross-domain architectures can reduce this gap but are data-hungry, while LLMs offer a promising alternative through zero- and few-shot learning. Given inherent dataset confounds and possible pre-training exposure, results should be interpreted as robustness evaluations within this English, text-only protocol.
翻译:近年来,假新闻检测在公共辩论和科学研究中受到越来越多的关注。尽管检测技术不断进步,但大型语言模型(LLMs)和社交媒体的放大效应使得虚假信息的生成与传播变得更加复杂。我们对12种具有代表性的假新闻检测方法进行了批判性评估,涵盖传统机器学习、深度学习、Transformer以及专门的跨领域架构。我们在10个公开数据集上评估了这些方法,这些数据集在体裁、来源、主题和标注逻辑上各不相同。我们将英文纯文本假新闻检测视为二分类任务,通过将标签统一为“真实”和“虚假”来确保一致的评估协议。我们承认不同数据集的标签语义存在差异,且统一化不可避免地会消除这些语义细微差别。每个数据集被视为一个独立领域。我们进行了领域内、多领域和跨领域实验,以模拟涉及领域偏移和分布外数据的真实场景。微调模型在领域内表现良好,但难以泛化。跨领域架构可以缩小这一差距,但数据需求高,而LLMs通过零样本和少样本学习提供了一种有前景的替代方案。考虑到固有的数据集混淆因素和可能的预训练暴露,结果应被视为在此英文纯文本协议下的鲁棒性评估。