The advent of multi-modal large language models (MLLMs) has greatly advanced research on video fake news detection (VFND) tasks. Existing benchmarks typically focus on the detection accuracy, while failing to provide fine-grained assessments for the entire detection process. To address these limitations, we introduce {POVFNDB (Process-oriented Video Fake News Detection Benchmark)}, a process-oriented benchmark comprising 10 tasks designed to systematically evaluate MLLMs' perception, understanding, and reasoning capabilities in VFND. This benchmark contains \textit{36,240} human-annotated question-answer (QA) in structured or open-ended formats, spanning 15 distinct evaluation dimensions that characterize different aspects of the video fake news detection process. Using POVFNDB, we conduct comprehensive evaluations on both proprietary and open-source MLLMs. Moreover, we establish a strong benchmark baseline by fine-tuning Qwen2.5VL-7B-Instruct on process-oriented chain-of-thought data constructed with our proposed POVFND-CoT framework, achieving state-of-the-art performance on VFND.
翻译:多模态大语言模型(MLLMs)的出现极大地推动了视频虚假新闻检测(VFND)任务的研究。现有基准通常侧重于检测准确率,而未能为整个检测过程提供细粒度的评估。为应对这些局限,我们引入了{POVFNDB(面向过程的视频虚假新闻检测基准)},这是一个面向过程的基准,包含10项任务,旨在系统评估MLLMs在VFND中的感知、理解与推理能力。该基准包含\textit{36,240}个人工标注的结构化或开放式问答对,涵盖15个不同的评估维度,这些维度刻画了视频虚假新闻检测过程的不同方面。利用POVFNDB,我们对专有及开源MLLMs进行了全面评估。此外,我们通过使用所提出的POVFND-CoT框架构建的面向过程思维链数据对Qwen2.5VL-7B-Instruct进行微调,建立了一个强大的基准基线,在VFND上实现了最先进的性能。