The option of sharing images, videos and audio files on social media opens up new possibilities for distinguishing between false information and fake news on the Internet. Due to the vast amount of data shared every second on social media, not all data can be verified by a computer or a human expert. Here, a check-worthiness analysis can be used as a first step in the fact-checking pipeline and as a filtering mechanism to improve efficiency. This paper proposes a novel way of detecting the check-worthiness in multi-modal tweets. It takes advantage of two classifiers, each trained on a single modality. For image data, extracting the embedded text with an OCR analysis has shown to perform best. By combining the two classifiers, the proposed solution was able to place first in the CheckThat! 2023 Task 1A with an F1 score of 0.7297 achieved on the private test set.
翻译:社交媒体上分享图像、视频和音频文件的功能为辨别虚假信息和假新闻开辟了新途径。由于社交平台上每秒产生的数据量巨大,并非所有数据都能由计算机或人类专家进行验证。在此背景下,核查价值分析可作为事实核查流程的第一步和提升效率的过滤机制。本文提出了一种检测多模态推文核查价值的新方法。该方法利用两个分别基于单一模态训练的分类器。对于图像数据,通过OCR分析提取其中嵌入的文本被证明表现最佳。通过组合这两个分类器,所提出的解决方案在CheckThat! 2023任务1A中取得了第一名,在私有测试集上实现了0.7297的F1分数。