In the field of multimodal fact checking, the accuracy of retrieving evidence from different modalities has a significant impact on the downstream claim verification process. Existing general multimodal retrieval methods are often constructed based on semantics, resulting in the retrieved evidence being similar but not relevant to the claim. This paper proposes a \textbf{D}ynamic \textbf{A}daptive \textbf{C}ontrastive \textbf{L}earning method for evidence \textbf{R}etrieval called DACLR to address these issues. DACLR first uses a Multimodal Large Language Model (MLLM) to uniformly convert multimodal evidence and claims into text modalities, and extracts the features of these information at event level. Then, it conducts evidence retrieval through a two-stage retrieval method of recall-rerank. DACLR enhances the model's event perception ability of the retrieval stage by optimizing the contrastive loss and mining hard negative samples. Specifically, DACLR designs three loss functions at two levels (semantic and event) based on the InfoNCE loss.Corresponding to these, three sets of hard negative sample candidates are set up. The model dynamically adjusts the ratio based on the accuracy supervision signal of intra-batch samples, allowing the model to learn the correlation between claims and positive samples at the event level without forgetting the semantic retrieval ability. Extensive comparison and ablation experiments demonstrates the effectiveness of DACLR and its internal optimization methods. Further research also prove the advantages of DACLR in the field of multimodal evidence retrieval.
翻译:在多模态事实核查领域,从不同模态中检索证据的准确性对下游声明验证过程具有重要影响。现有通用多模态检索方法通常基于语义构建,导致检索到的证据与声明虽相似却缺乏相关性。本文提出名为DACLR的**动态自适应对比学习**证据检索方法以解决这些问题。DACLR首先使用多模态大语言模型(MLLM)将多模态证据与声明统一转换为文本模态,并在事件级别提取这些信息的特征。随后,通过召回-重排序的两阶段检索方法进行证据检索。DACLR通过优化对比损失并挖掘困难负样本来增强模型在检索阶段的事件感知能力。具体而言,DACLR基于InfoNCE损失在语义与事件两个层级设计了三种损失函数,并对应设置三组困难负样本候选集。模型根据批次内样本的准确性监督信号动态调整比例,使得模型能够在不遗忘语义检索能力的前提下,学习声明与正样本在事件层面的相关性。广泛对比实验与消融实验验证了DACLR及其内部优化方法的有效性。进一步研究也证明了DACLR在多模态证据检索领域中的优势。