Large language models (LLMs) often hallucinate, yet most existing fact-checking methods treat factuality evaluation as a binary classification problem, offering limited interpretability and failing to capture fine-grained error types. In this paper, we introduce InFi-Check, a framework for interpretable and fine-grained fact-checking of LLM outputs. Specifically, we first propose a controlled data synthesis pipeline that generates high-quality data featuring explicit evidence, fine-grained error type labels, justifications, and corrections. Based on this, we further construct large-scale training data and a manually verified benchmark InFi-Check-FG for fine-grained fact-checking of LLM outputs. Building on these high-quality training data, we further propose InFi-Checker, which can jointly provide supporting evidence, classify fine-grained error types, and produce justifications along with corrections. Experiments show that InFi-Checker achieves state-of-the-art performance on InFi-Check-FG and strong generalization across various downstream tasks, significantly improving the utility and trustworthiness of factuality evaluation.
翻译:大语言模型(LLM)常产生幻觉,而现有大多数事实核查方法将事实性评估视为二元分类问题,可解释性有限且无法捕捉细粒度的错误类型。本文提出InFi-Check,一个针对LLM输出的可解释细粒度事实核查框架。具体而言,我们首先提出一种受控数据合成流程,能生成包含显式证据、细粒度错误类型标签、依据说明及修正内容的高质量数据。基于此,我们进一步构建了大规模训练数据及经人工验证的细粒度事实核查基准InFi-Check-FG。依托这些高质量训练数据,我们进一步提出InFi-Checker模型,该模型能够联合提供支持证据、分类细粒度错误类型,并生成附带修正的说明。实验表明,InFi-Checker在InFi-Check-FG基准上达到最先进性能,并在多种下游任务中展现出强大的泛化能力,显著提升了事实性评估的实用性与可信度。