Verifiable claim detection asks whether a claim expresses a factual statement that can, in principle, be assessed against external evidence. As an early filtering stage in automated fact-checking, it plays an important role in reducing the burden on downstream verification components. However, existing approaches to claim detection, whether based on check-worthiness or verifiability, rely solely on the claim text itself. This is a notable limitation for verifiable claim detection in particular, where determining whether a claim is checkable may benefit from knowing what entities and events it refers to and whether relevant information exists to support verification. Inspired by the established role of evidence retrieval in later-stage claim verification, we propose Context-Driven Claim Detection (ContextClaim), a paradigm that advances retrieval to the detection stage. ContextClaim extracts entity mentions from the input claim, retrieves relevant information from Wikipedia as a structured knowledge source, and employs large language models to produce concise contextual summaries for downstream classification. We evaluate ContextClaim on two datasets covering different topics and text genres, the CheckThat! 2022 COVID-19 Twitter dataset and the PoliClaim political debate dataset, across encoder-only and decoder-only models under fine-tuning, zero-shot, and few-shot settings. Results show that context augmentation can improve verifiable claim detection, although its effectiveness varies across domains, model architectures, and learning settings. Through component analysis, human evaluation, and error analysis, we further examine when and why the retrieved context contributes to more reliable verifiability judgments.
翻译:摘要:可验证声明检测旨在判断某一声明是否表达了一种原则上可基于外部证据进行评估的事实性陈述。作为自动化事实核查中的早期过滤阶段,它在减轻下游验证组件的负担方面发挥着重要作用。然而,现有声明检测方法(无论是基于核查价值还是可验证性)仅依赖于声明文本本身。这尤其限制了可验证声明检测的发展——在此类任务中,判断某一声明是否可核查可能需要了解其所指代的具体实体与事件,以及是否存在支持验证的相关信息。受后期声明验证阶段中证据检索已确立作用的启发,我们提出上下文驱动声明检测范式(ContextClaim),将检索功能前置到检测阶段。ContextClaim从输入声明中提取实体提及,从结构化知识源维基百科中检索相关信息,并利用大语言模型生成简洁的上下文摘要以支持下游分类。我们在两个涵盖不同主题与文本类型的数据集(CheckThat! 2022 COVID-19推特数据集与PoliClaim政治辩论数据集)上,基于编码器-only与解码器-only模型在微调、零样本与少样本设置下进行评估。结果显示,上下文增强能够改善可验证声明检测,但其效果因领域、模型架构与学习设置而异。通过组件分析、人工评估与错误分析,我们进一步探究了检索上下文何时及为何有助于更可靠的可验证性判断。