Large language models (LLMs) are notorious for hallucinating, i.e., producing erroneous claims in their output. Such hallucinations can be dangerous, as occasional factual inaccuracies in the generated text might be obscured by the rest of the output being generally factual, making it extremely hard for the users to spot them. Current services that leverage LLMs usually do not provide any means for detecting unreliable generations. Here, we aim to bridge this gap. In particular, we propose a novel fact-checking and hallucination detection pipeline based on token-level uncertainty quantification. Uncertainty scores leverage information encapsulated in the output of a neural network or its layers to detect unreliable predictions, and we show that they can be used to fact-check the atomic claims in the LLM output. Moreover, we present a novel token-level uncertainty quantification method that removes the impact of uncertainty about what claim to generate on the current step and what surface form to use. Our method Claim Conditioned Probability (CCP) measures only the uncertainty of particular claim value expressed by the model. Experiments on the task of biography generation demonstrate strong improvements for CCP compared to the baselines for six different LLMs and three languages. Human evaluation reveals that the fact-checking pipeline based on uncertainty quantification is competitive with a fact-checking tool that leverages external knowledge.
翻译:大语言模型(LLMs)以产生幻觉而闻名,即在输出中生成错误声明。此类幻觉可能带来危险,因为生成的文本中偶尔出现的事实性不准确可能被其他通常事实正确的输出所掩盖,使用户极难察觉。当前利用LLMs的服务通常不提供检测不可靠生成内容的方法。本研究旨在弥补这一缺陷。具体而言,我们提出了一种基于词元级不确定性量化的新型事实核查与幻觉检测流水线。不确定性分数利用神经网络输出或其层中封装的信息来检测不可靠预测,我们证明了这些分数可用于核查LLM输出中的原子声明。此外,我们提出了一种新颖的词元级不确定性量化方法,该方法消除了当前步骤中关于生成何种声明及使用何种表面形式的不确定性影响。我们的方法"声明条件概率(CCP)"仅度量模型所表达的特定声明值的不确定性。在传记生成任务上的实验表明,针对六种不同LLM和三种语言,CCP相比基线方法取得了显著改进。人工评估揭示,基于不确定性量化的事实核查流水线与利用外部知识的事实核查工具具有竞争力。