Multimodal Misinformation Detection using Large Vision-Language Models

The increasing proliferation of misinformation and its alarming impact have motivated both industry and academia to develop approaches for misinformation detection and fact checking. Recent advances on large language models (LLMs) have shown remarkable performance in various tasks, but whether and how LLMs could help with misinformation detection remains relatively underexplored. Most of existing state-of-the-art approaches either do not consider evidence and solely focus on claim related features or assume the evidence to be provided. Few approaches consider evidence retrieval as part of the misinformation detection but rely on fine-tuning models. In this paper, we investigate the potential of LLMs for misinformation detection in a zero-shot setting. We incorporate an evidence retrieval component into the process as it is crucial to gather pertinent information from various sources to detect the veracity of claims. To this end, we propose a novel re-ranking approach for multimodal evidence retrieval using both LLMs and large vision-language models (LVLM). The retrieved evidence samples (images and texts) serve as the input for an LVLM-based approach for multimodal fact verification (LVLM4FV). To enable a fair evaluation, we address the issue of incomplete ground truth for evidence samples in an existing evidence retrieval dataset by annotating a more complete set of evidence samples for both image and text retrieval. Our experimental results on two datasets demonstrate the superiority of the proposed approach in both evidence retrieval and fact verification tasks and also better generalization capability across dataset compared to the supervised baseline.

翻译：虚假信息的日益泛滥及其令人担忧的影响，促使工业界和学术界致力于开发虚假信息检测与事实核查方法。大型语言模型（LLMs）的最新进展已在多项任务中展现出卓越性能，但LLMs是否以及如何助力虚假信息检测仍相对缺乏深入探索。现有的大多数先进方法要么不考虑证据而仅关注与声明相关的特征，要么假设证据已给定。少数方法将证据检索作为虚假信息检测的一部分，但依赖于微调模型。本文研究了LLMs在零样本设置下用于虚假信息检测的潜力。我们在流程中引入了证据检索组件，因为从多种来源收集相关信息对于判断声明的真实性至关重要。为此，我们提出了一种新颖的利用LLMs和大型视觉语言模型（LVLMs）进行多模态证据检索的重排序方法。检索到的证据样本（图像和文本）将作为基于LVLM的多模态事实核查方法（LVLM4FV）的输入。为确保公平评估，我们针对现有证据检索数据集中证据样本真值标注不完整的问题，为图像和文本检索标注了更完整的证据样本集。在两个数据集上的实验结果表明，所提方法在证据检索和事实核查任务中均优于现有方法，并且相比有监督基线，在跨数据集泛化能力方面表现更佳。