Verifying the truthfulness of claims usually requires joint multi-modal reasoning over both textual and visual evidence, such as analyzing both textual caption and chart image for claim verification. In addition, to make the reasoning process transparent, a textual explanation is necessary to justify the verification result. However, most claim verification works mainly focus on the reasoning over textual evidence only or ignore the explainability, resulting in inaccurate and unconvincing verification. To address this problem, we propose a novel model that jointly achieves evidence retrieval, multi-modal claim verification, and explanation generation. For evidence retrieval, we construct a two-layer multi-modal graph for claims and evidence, where we design image-to-text and text-to-image reasoning for multi-modal retrieval. For claim verification, we propose token- and evidence-level fusion to integrate claim and evidence embeddings for multi-modal verification. For explanation generation, we introduce multi-modal Fusion-in-Decoder for explainability. Finally, since almost all the datasets are in general domain, we create a scientific dataset, AIChartClaim, in AI domain to complement claim verification community. Experiments show the strength of our model.
翻译:验证声明的真实性通常需要结合文本与视觉证据进行多模态联合推理,例如通过分析文本描述与图表图像来完成声明验证。此外,为使推理过程透明化,需生成文本解释以佐证验证结果。然而,现有声明验证研究大多仅关注文本证据推理或忽略可解释性,导致验证结果不准确且缺乏说服力。为解决此问题,我们提出一种新型模型,可同步实现证据检索、多模态声明验证与解释生成。在证据检索方面,我们构建了声明与证据的双层多模态图,并设计了图像到文本与文本到图像的跨模态推理机制以实现多模态检索。在声明验证方面,我们提出令牌级与证据级融合策略,通过整合声明与证据嵌入进行多模态验证。在解释生成方面,我们引入多模态解码器融合机制以实现可解释性。最后,鉴于现有数据集多集中于通用领域,我们构建了人工智能领域的科学数据集AIChartClaim,以完善声明验证研究体系。实验结果表明该模型具有显著优势。