Trustworthiness is a core research challenge for agentic AI systems built on Large Language Models (LLMs). To enhance trust, natural language claims from diverse sources, including human-written text, web content, and model outputs, are commonly checked for factuality by retrieving external knowledge and using an LLM to verify the faithfulness of claims to the retrieved evidence. As a result, such methods are constrained by retrieval errors and external data availability, while leaving the models intrinsic fact-verification capabilities largely unused. We propose the task of fact-checking without retrieval, focusing on the verification of arbitrary natural language claims, independent of their source. To study this setting, we introduce a comprehensive evaluation framework focused on generalization, testing robustness to (i) long-tail knowledge, (ii) variation in claim sources, (iii) multilinguality, and (iv) long-form generation. Across 9 datasets, 18 methods and 3 models, our experiments indicate that logit-based approaches often underperform compared to those that leverage internal model representations. Building on this finding, we introduce INTRA, a method that exploits interactions between internal representations and achieves state-of-the-art performance with strong generalization. More broadly, our work establishes fact-checking without retrieval as a promising research direction that can complement retrieval-based frameworks, improve scalability, and enable the use of such systems as reward signals during training or as components integrated into the generation process.
翻译:可信度是基于大语言模型(LLMs)构建的智能体AI系统的核心研究挑战。为增强可信度,通常通过检索外部知识并利用LLM验证主张与检索证据的一致性,来核查来自不同来源(包括人类撰写的文本、网络内容和模型输出)的自然语言主张的事实性。因此,此类方法受限于检索错误和外部数据的可用性,同时很大程度上未利用模型内在的事实核查能力。我们提出了无需检索的事实核查任务,专注于独立于来源的任意自然语言主张的验证。为研究这一设定,我们引入了一个专注于泛化能力的综合评估框架,测试其对以下方面的鲁棒性:(i)长尾知识,(ii)主张来源的多样性,(iii)多语言性,以及(iv)长文本生成。在涵盖9个数据集、18种方法和3个模型的实验中,我们的结果表明,基于对数概率的方法通常逊色于利用内部模型表示的方法。基于这一发现,我们提出了INTRA方法,该方法利用内部表示之间的交互作用,实现了具有强大泛化能力的最先进性能。更广泛而言,我们的工作确立了无需检索的事实核查作为一个有前景的研究方向,它可以补充基于检索的框架,提高可扩展性,并使得此类系统能够在训练期间作为奖励信号使用,或作为集成到生成过程中的组件。