Multimodal misinformation increasingly mixes realistic im-age edits with fluent but misleading text, producing persuasive posts that are difficult to verify. Existing systems usually rely on a single evidence source. Content-based detectors identify local inconsistencies within an image and its caption but cannot determine global factual truth. Retrieval-based fact-checkers reason over external evidence but treat inputs as coarse claims and often miss subtle visual or textual manipulations. This separation creates failure cases where internally consistent fabrications bypass manipulation detectors and fact-checkers verify claims that contain pixel-level or token-level corruption. We present D-SECURE, a framework that combines internal manipulation detection with external evidence-based reasoning for news-style posts. D-SECURE integrates the HAMMER manipulation detector with the DEFAME retrieval pipeline. DEFAME performs broad verification, and HAMMER analyses residual or uncertain cases that may contain fine-grained edits. Experiments on DGM4 and ClaimReview samples highlight the complementary strengths of both systems and motivate their fusion. We provide a unified, explainable report that incorporates manipulation cues and external evidence.
翻译:多模态虚假信息越来越多地将逼真的图像编辑与流畅但具有误导性的文本相结合,产生难以核实的说服性帖子。现有系统通常依赖单一证据源。基于内容的检测器识别图像及其标题中的局部不一致性,但无法判断全局事实真伪。基于检索的事实核查器则依据外部证据进行推理,但将输入视为粗略声明,常常忽略细微的视觉或文本篡改。这种分离导致失败案例:内部一致但虚构的内容绕过篡改检测器,而事实核查器可能验证包含像素级或词元级篡改的声明。我们提出D-SECURE框架,该框架将内部篡改检测与基于外部证据的推理相结合,适用于新闻类帖子。D-SECURE整合了HAMMER篡改检测器与DEFAME检索流程。DEFAME执行广泛验证,HAMMER则分析可能包含细粒度编辑的残留或不确定案例。在DGM4和ClaimReview样本上的实验凸显了两个系统的互补优势,并论证了其融合的必要性。我们提供了一种统一的、可解释的报告格式,该报告融合了篡改线索与外部证据。