Uncertainty quantification in Large Language Models (LLMs) is crucial for applications where safety and reliability are important. In particular, uncertainty can be used to improve the trustworthiness of LLMs by detecting factually incorrect model responses, commonly called hallucinations. Critically, one should seek to capture the model's semantic uncertainty, i.e., the uncertainty over the meanings of LLM outputs, rather than uncertainty over lexical or syntactic variations that do not affect answer correctness. To address this problem, we propose Kernel Language Entropy (KLE), a novel method for uncertainty estimation in white- and black-box LLMs. KLE defines positive semidefinite unit trace kernels to encode the semantic similarities of LLM outputs and quantifies uncertainty using the von Neumann entropy. It considers pairwise semantic dependencies between answers (or semantic clusters), providing more fine-grained uncertainty estimates than previous methods based on hard clustering of answers. We theoretically prove that KLE generalizes the previous state-of-the-art method called semantic entropy and empirically demonstrate that it improves uncertainty quantification performance across multiple natural language generation datasets and LLM architectures.
翻译:大语言模型(LLM)的不确定性量化在安全性和可靠性至关重要的应用中极为关键。特别是,不确定性可用于通过检测事实错误的模型响应(通常称为幻觉)来提高LLM的可信度。关键在于,应致力于捕捉模型的语义不确定性,即对LLM输出含义的不确定性,而非不影响答案正确性的词汇或句法变化的不确定性。为解决此问题,我们提出核语言熵(KLE),一种用于白盒与黑盒LLM不确定性估计的新方法。KLE定义半正定单位迹核以编码LLM输出的语义相似性,并利用冯·诺依曼熵量化不确定性。该方法考虑答案(或语义簇)之间的成对语义依赖性,相比以往基于答案硬聚类的方法提供更细粒度的不确定性估计。我们从理论上证明KLE推广了先前称为语义熵的最先进方法,并通过实验证明其在多个自然语言生成数据集和LLM架构上提升了不确定性量化性能。