With the rise of generative AI, automated fact-checking methods to combat misinformation are becoming more and more important. However, factual claim detection, the first step in a fact-checking pipeline, suffers from two key issues that limit its scalability and generalizability: (1) inconsistency in definitions of the task and what a claim is, and (2) the high cost of manual annotation. To address (1), we review the definitions in related work and propose a unifying definition of factual claims that focuses on verifiability. To address (2), we introduce AFaCTA (Automatic Factual Claim deTection Annotator), a novel framework that assists in the annotation of factual claims with the help of large language models (LLMs). AFaCTA calibrates its annotation confidence with consistency along three predefined reasoning paths. Extensive evaluation and experiments in the domain of political speech reveal that AFaCTA can efficiently assist experts in annotating factual claims and training high-quality classifiers, and can work with or without expert supervision. Our analyses also result in PoliClaim, a comprehensive claim detection dataset spanning diverse political topics.
翻译:随着生成式人工智能的兴起,用于对抗错误信息的自动化事实核查方法变得越来越重要。然而,事实核查流程的第一步——事实性主张检测,存在两个关键问题,限制了其可扩展性和泛化能力:(1) 任务定义及“主张”概念的不一致性;(2) 人工标注的高昂成本。为解决(1),我们回顾了相关工作中的定义,并提出了一个以可验证性为核心、统一的事实性主张定义。为解决(2),我们引入了AFaCTA(自动事实性主张检测标注器),这是一个新颖的框架,借助大型语言模型辅助事实性主张的标注。AFaCTA通过沿三条预定义推理路径的一致性来校准其标注置信度。在政治演讲领域的大量评估和实验表明,AFaCTA能够有效协助专家标注事实性主张并训练高质量的分类器,且可在有或无专家监督的情况下工作。我们的分析还产生了PoliClaim,这是一个涵盖广泛政治主题的综合性主张检测数据集。