News media, especially video news media, have penetrated into every aspect of daily life, which also brings the risk of fake news. Therefore, multimodal fake news detection has recently received more attention. However, the number of fake news detection data sets for video modal is small, and these data sets are composed of unofficial videos uploaded by users, so there is too much useless data. To solve this problem, we present in this paper a dataset named Official-NV, which consists of officially published news videos on Xinhua. We crawled videos on Xinhua, and then extended the data set using LLM generation and manual modification. In addition, we benchmarked the data set presented in this paper using a baseline model to demonstrate the advantage of Official-NV in multimodal fake news detection.
翻译:新闻媒体,特别是视频新闻媒体,已渗透到日常生活的方方面面,这也带来了虚假新闻的风险。因此,多模态虚假新闻检测近来受到更多关注。然而,针对视频模态的虚假新闻检测数据集数量较少,且这些数据集多由用户上传的非官方视频构成,存在大量无用数据。为解决此问题,本文提出了一个名为Official-NV的数据集,该数据集由新华社官方发布的新闻视频构成。我们爬取了新华社的视频,并利用LLM生成与人工修改的方式扩展了数据集。此外,我们使用基线模型对本文提出的数据集进行了基准测试,以证明Official-NV在多模态虚假新闻检测中的优势。