The increasing proliferation of misinformation and its alarming impact have motivated both industry and academia to develop approaches for fake news detection. However, state-of-the-art approaches are usually trained on datasets of smaller size or with a limited set of specific topics. As a consequence, these models lack generalization capabilities and are not applicable to real-world data. In this paper, we propose three models that adopt and fine-tune state-of-the-art multimodal transformers for multimodal fake news detection. We conduct an in-depth analysis by manipulating the input data aimed to explore models performance in realistic use cases on social media. Our study across multiple models demonstrates that these systems suffer significant performance drops against manipulated data. To reduce the bias and improve model generalization, we suggest training data augmentation to conduct more meaningful experiments for fake news detection on social media. The proposed data augmentation techniques enable models to generalize better and yield improved state-of-the-art results.
翻译:虚假信息的日益泛滥及其引发的警示性影响,推动工业界和学术界纷纷开发虚假新闻检测方法。然而,现有最先进方法通常在小规模数据集或有限主题范围的数据上训练,导致模型缺乏泛化能力,难以应用于真实场景数据。本文提出三种基于多模态Transformer微调的模型,用于多模态虚假新闻检测。通过操控输入数据深入分析模型在社交媒体真实用例中的性能表现,研究发现多种模型在应对操控数据时性能显著下降。为减少偏差并提升模型泛化能力,我们建议采用训练数据增强策略,以开展更具意义的社交媒体虚假新闻检测实验。所提出的数据增强技术使模型获得更优泛化性能,并取得领先水平的新成果。