The internet gives the world an open platform to express their views and share their stories. While this is very valuable, it makes fake news one of our society's most pressing problems. Manual fact checking process is time consuming, which makes it challenging to disprove misleading assertions before they cause significant harm. This is he driving interest in automatic fact or claim verification. Some of the existing datasets aim to support development of automating fact-checking techniques, however, most of them are text based. Multi-modal fact verification has received relatively scant attention. In this paper, we provide a multi-modal fact-checking dataset called FACTIFY 2, improving Factify 1 by using new data sources and adding satire articles. Factify 2 has 50,000 new data instances. Similar to FACTIFY 1.0, we have three broad categories - support, no-evidence, and refute, with sub-categories based on the entailment of visual and textual data. We also provide a BERT and Vison Transformer based baseline, which acheives 65% F1 score in the test set. The baseline codes and the dataset will be made available at https://github.com/surya1701/Factify-2.0.
翻译:摘要:互联网为世界提供了一个开放平台,供人们表达观点和分享故事。虽然这极具价值,但也使假新闻成为社会最紧迫的问题之一。人工事实核查过程耗时费力,这使得在误导性断言造成重大伤害之前对其进行反驳变得极具挑战性。这推动了人们对自动事实或主张验证的兴趣。现有的一些数据集旨在支持自动化事实核查技术的发展,然而,其中大多数是基于文本的。多模态事实核查受到相对较少的关注。本文提供了一个名为FACTIFY 2的多模态事实核查数据集,通过使用新数据源并添加讽刺文章,对Factify 1进行了改进。Factify 2包含50,000个新数据实例。与FACTIFY 1.0类似,我们设有三大类别——支持、无证据和反驳,并基于视觉和文本数据的蕴含关系设置了子类别。我们还提供了基于BERT和Vision Transformer的基线模型,在测试集上达到了65%的F1分数。基线代码和数据集将在https://github.com/surya1701/Factify-2.0上提供。