Online misinformation is often multimodal in nature, i.e., it is caused by misleading associations between texts and accompanying images. To support the fact-checking process, researchers have been recently developing automatic multimodal methods that gather and analyze external information, evidence, related to the image-text pairs under examination. However, prior works assumed all collected evidence to be relevant. In this study, we introduce a "Relevant Evidence Detection" (RED) module to discern whether each piece of evidence is relevant, to support or refute the claim. Specifically, we develop the "Relevant Evidence Detection Directed Transformer" (RED-DOT) and explore multiple architectural variants (e.g., single or dual-stage) and mechanisms (e.g., "guided attention"). Extensive ablation and comparative experiments demonstrate that RED-DOT achieves significant improvements over the state-of-the-art on the VERITE benchmark by up to 28.5%. Furthermore, our evidence re-ranking and element-wise modality fusion led to RED-DOT achieving competitive and even improved performance on NewsCLIPings+, without the need for numerous evidence or multiple backbone encoders. Finally, our qualitative analysis demonstrates that the proposed "guided attention" module has the potential to enhance the architecture's interpretability. We release our code at: https://github.com/stevejpapad/relevant-evidence-detection
翻译:在线虚假信息通常具有多模态特性,即由文本与相关图片之间的误导性关联引发。为支持事实核查流程,研究者近年来开始开发自动化的多模态方法,用于收集并分析被核查图文对的外部信息与证据。然而,先前研究均假设所有收集到的证据具有相关性。本研究提出"相关证据检测"(RED)模块,用于判别每条证据是否与支持或反驳主张相关。具体而言,我们开发了"相关证据检测导向Transformer"(RED-DOT),并探索了多种架构变体(如单/双阶段)与机制(如"导向注意力")。大量消融与对比实验表明,RED-DOT在VERITE基准上较最先进方法实现高达28.5%的显著提升。此外,通过证据重排序与逐元素模态融合,RED-DOT无需大量证据或多骨干编码器,即可在NewsCLIPings+上取得具有竞争力乃至更优的性能。最后,定性分析证明,所提出的"导向注意力"模块具有增强架构可解释性的潜力。相关代码已发布于:https://github.com/stevejpapad/relevant-evidence-detection