Automated fact-checking has drawn considerable attention over the past few decades due to the increase in the diffusion of misinformation on online platforms. This is often carried out as a sequence of tasks comprising (i) the detection of sentences circulating in online platforms which constitute claims needing verification, followed by (ii) the verification process of those claims. This survey focuses on the former, by discussing existing efforts towards detecting claims needing fact-checking, with a particular focus on multilingual data and methods. This is a challenging and fertile direction where existing methods are yet far from matching human performance due to the profoundly challenging nature of the issue. Especially, the dissemination of information across multiple social platforms, articulated in multiple languages and modalities demands more generalized solutions for combating misinformation. Focusing on multilingual misinformation, we present a comprehensive survey of existing multilingual claim detection research. We present state-of-the-art multilingual claim detection research categorized into three key factors of the problem, verifiability, priority, and similarity. Further, we present a detailed overview of the existing multilingual datasets along with the challenges and suggest possible future advancements.
翻译:自动化事实核查因在线平台上虚假信息传播的增加,在过去数十年间引起了广泛关注。该过程通常由一系列任务组成,包括:(i)检测在线平台上传播的需要核实的声明句子;(ii)对这些声明进行核实。本综述聚焦于前者,探讨现有针对需要事实核查声明的检测工作,并特别关注多语言数据与方法。这一方向充满挑战且前景广阔,由于问题本身的深度复杂性,现有方法仍远未达到人类表现水平。尤其值得关注的是,信息在多语言、多模态环境下通过多个社交平台传播,这要求更通用的解决方案来应对虚假信息。本文聚焦于多语言虚假信息,对现有的多语言声明检测研究进行了全面综述。我们将当前最先进的多语言声明检测研究归类为问题的三个关键因素:可验证性、优先级和相似性。此外,我们还详细概述了现有的多语言数据集及其挑战,并提出了可能的未来发展方向。