The rapid spread of multimodal misinformation on social media has raised growing concerns, while research on video misinformation detection remains limited due to the lack of large-scale, diverse datasets. Existing methods often overfit to rigid templates and lack deep reasoning over deceptive content. To address these challenges, we introduce FakeVV, a large-scale benchmark comprising over 100,000 video-text pairs with fine-grained, interpretable annotations. In addition, we further propose Fact-R1, a novel framework that integrates deep reasoning with collaborative rule-based reinforcement learning. Fact-R1 is trained through a three-stage process: (1) misinformation long-Chain-of-Thought (CoT) instruction tuning, (2) preference alignment via Direct Preference Optimization (DPO), and (3) Group Relative Policy Optimization (GRPO) using a novel verifiable reward function. This enables Fact-R1 to exhibit emergent reasoning behaviors comparable to those observed in advanced text-based reinforcement learning systems, but in the more complex multimodal misinformation setting. Our work establishes a new paradigm for misinformation detection, bridging large-scale video understanding, reasoning-guided alignment, and interpretable verification.
翻译:社交媒体上多模态虚假信息的快速传播已引发日益增长的担忧,而由于缺乏大规模、多样化的数据集,视频虚假信息检测的研究仍然有限。现有方法通常过度拟合于僵化的模板,并缺乏对欺骗性内容的深度推理。为应对这些挑战,我们引入了FakeVV,一个包含超过100,000个视频-文本对的大规模基准数据集,并提供了细粒度、可解释的标注。此外,我们进一步提出了Fact-R1,一个将深度推理与基于协作规则的强化学习相结合的新型框架。Fact-R1通过三阶段过程进行训练:(1) 虚假信息长思维链指令微调,(2) 通过直接偏好优化实现偏好对齐,以及(3) 使用新颖的可验证奖励函数进行组相对策略优化。这使得Fact-R1能够展现出与先进文本强化学习系统中观察到的相媲美的涌现推理行为,但应用于更复杂的多模态虚假信息场景。我们的工作为虚假信息检测建立了一个新范式,桥接了大尺度视频理解、推理引导的对齐以及可解释的验证。