Multimodal misinformation increasingly leverages visual persuasion, where repurposed or manipulated images strengthen misleading text. We introduce \textbf{RW-Post}, a post-aligned \textbf{text--image benchmark} for real-world multimodal fact-checking with \emph{auditable} annotations: each instance links the original social-media post with reasoning traces and explicitly linked evidence items derived from human fact-check articles via an LLM-assisted extraction-and-auditing pipeline. RW-Post supports controlled evaluation across closed-book, evidence-bounded, and open-web regimes, enabling systematic diagnosis of visual grounding and evidence utilization. We provide \textbf{AgentFact} as a reference verification baseline and benchmark strong open-source LVLMs under unified protocols. Experiments show substantial headroom: current models struggle with faithful evidence grounding, while evidence-bounded evaluation improves both accuracy and faithfulness. Code and dataset will be released at https://github.com/xudanni0927/AgentFact.
翻译:多模态虚假信息日益借助视觉说服手段,其中被重新利用或操纵的图像会强化具有误导性的文本。我们提出**RW-Post**——一个用于现实世界多模态事实核查的后对齐**文本-图像基准数据集**,并附带*可审计*标注:每个实例将原始社交媒体帖子与推理痕迹、以及通过大语言模型辅助的提取-审计流程从人工事实核查文章中显式链接的证据项相联结。RW-Post支持在封闭书库、证据受限和开放互联网三种评估范式下进行受控评估,从而能够系统性地诊断视觉基础与证据利用能力。我们提供**AgentFact**作为参考验证基线,并在统一协议下对主流开源大型视觉语言模型进行基准测试。实验表明存在显著提升空间:当前模型难以实现可靠的证据基础,而证据受限评估则同时提升了准确性与忠实度。代码与数据集将在https://github.com/xudanni0927/AgentFact 发布。