Fake news often involves multimedia information such as text and image to mislead readers, proliferating and expanding its influence. Most existing fake news detection methods apply the co-attention mechanism to fuse multimodal features while ignoring the consistency of image and text in co-attention. In this paper, we propose multimodal matching-aware co-attention networks with mutual knowledge distillation for improving fake news detection. Specifically, we design an image-text matching-aware co-attention mechanism which captures the alignment of image and text for better multimodal fusion. The image-text matching representation can be obtained via a vision-language pre-trained model. Additionally, based on the designed image-text matching-aware co-attention mechanism, we propose to build two co-attention networks respectively centered on text and image for mutual knowledge distillation to improve fake news detection. Extensive experiments on three benchmark datasets demonstrate that our proposed model achieves state-of-the-art performance on multimodal fake news detection.
翻译:假新闻常利用文本和图像等多模态信息误导读者,从而加速其传播并扩大影响力。现有假新闻检测方法多采用共注意力机制融合多模态特征,却忽视了图像与文本在共注意力中的一致性。本文提出基于互知识蒸馏的多模态匹配感知共注意力网络以改进假新闻检测。具体而言,我们设计了一种图像-文本匹配感知共注意力机制,通过捕获图像与文本的对齐关系实现更优的多模态融合,其中图像-文本匹配表征可通过视觉-语言预训练模型获取。此外,基于所设计的图像-文本匹配感知共注意力机制,我们构建了分别以文本和图像为中心的两个共注意力网络进行互知识蒸馏,以提升假新闻检测性能。在三个基准数据集上的大量实验表明,所提模型在多模态假新闻检测任务上达到了最优性能。