Automatic detection of multimodal misinformation has gained a widespread attention recently. However, the potential of powerful Large Language Models (LLMs) for multimodal misinformation detection remains underexplored. Besides, how to teach LLMs to interpret multimodal misinformation in cost-effective and accessible way is still an open question. To address that, we propose MMIDR, a framework designed to teach LLMs in providing fluent and high-quality textual explanations for their decision-making process of multimodal misinformation. To convert multimodal misinformation into an appropriate instruction-following format, we present a data augmentation perspective and pipeline. This pipeline consists of a visual information processing module and an evidence retrieval module. Subsequently, we prompt the proprietary LLMs with processed contents to extract rationales for interpreting the authenticity of multimodal misinformation. Furthermore, we design an efficient knowledge distillation approach to distill the capability of proprietary LLMs in explaining multimodal misinformation into open-source LLMs. To explore several research questions regarding the performance of LLMs in multimodal misinformation detection tasks, we construct an instruction-following multimodal misinformation dataset and conduct comprehensive experiments. The experimental findings reveal that our MMIDR exhibits sufficient detection performance and possesses the capacity to provide compelling rationales to support its assessments.
翻译:近年来,多模态虚假信息的自动检测受到广泛关注。然而,强大语言模型(LLMs)在多模态虚假信息检测中的潜力仍未充分挖掘。此外,如何以低成本且易实施的方式教授LLMs解读多模态虚假信息仍是一个悬而未决的问题。为此,我们提出MMIDR框架,旨在教授LLMs为多模态虚假信息的决策过程提供流畅且高质量的文字解释。为将多模态虚假信息转化为合适的指令遵循格式,我们提出一种数据增强视角及流程。该流程包含视觉信息处理模块与证据检索模块。随后,我们向专有LLMs输入处理后的内容以提取用于解读多模态虚假信息真实性的依据。此外,我们设计了一种高效的知识蒸馏方法,将专有LLMs解释多模态虚假信息的能力蒸馏至开源LLMs。为探究LLMs在多模态虚假信息检测任务中的表现相关的若干研究问题,我们构建了一个指令遵循型多模态虚假信息数据集并进行了全面实验。实验结果表明,我们的MMIDR不仅展现出充分的检测性能,还具备提供令人信服的理由以支撑其评估的能力。