Large language models (LLMs) have demonstrated remarkable performance across diverse tasks by encoding vast amounts of factual knowledge. However, they are still prone to hallucinations, generating incorrect or misleading information, often accompanied by high uncertainty. Existing methods for hallucination detection primarily focus on quantifying internal uncertainty, which arises from missing or conflicting knowledge within the model. However, hallucinations can also stem from external uncertainty, where ambiguous user queries lead to multiple possible interpretations. In this work, we introduce Semantic Volume, a novel mathematical measure for quantifying both external and internal uncertainty in LLMs. Our approach perturbs queries and responses, embeds them in a semantic space, and computes the determinant of the Gram matrix of the embedding vectors, capturing their dispersion as a measure of uncertainty. Our framework provides a generalizable and unsupervised uncertainty detection method without requiring white-box access to LLMs. We conduct extensive experiments on both external and internal uncertainty detection, demonstrating that our Semantic Volume method consistently outperforms existing baselines in both tasks. Additionally, we provide theoretical insights linking our measure to differential entropy, unifying and extending previous sampling-based uncertainty measures such as the semantic entropy. Semantic Volume is shown to be a robust and interpretable approach to improving the reliability of LLMs by systematically detecting uncertainty in both user queries and model responses.
翻译:大语言模型(LLMs)通过编码海量事实知识,已在多种任务中展现出卓越性能。然而,它们仍易产生幻觉,生成错误或误导性信息,且常伴随高度不确定性。现有的幻觉检测方法主要侧重于量化内部不确定性,即源于模型内部知识缺失或冲突所引发的不确定性。然而,幻觉也可能源自外部不确定性,即模糊的用户查询导致多种可能的解释。本文提出语义体积,一种用于量化大语言模型中外部与内部不确定性的新型数学度量。我们的方法通过对查询与响应进行扰动,将其嵌入语义空间,并计算嵌入向量格拉姆矩阵的行列式,以捕获其离散程度作为不确定性的度量。该框架提供了一种无需白盒访问大语言模型即可实现的可泛化、无监督的不确定性检测方法。我们在外部与内部不确定性检测上进行了大量实验,结果表明我们的语义体积方法在两项任务中均持续优于现有基线。此外,我们从理论角度阐明了该度量与微分熵的联系,统一并扩展了以往基于采样的不确定性度量方法,如语义熵。研究表明,语义体积是一种通过系统检测用户查询与模型响应中的不确定性来提升大语言模型可靠性的稳健且可解释的方法。