Large Language Models (LLMs) display formidable capabilities in generative tasks but also pose potential risks due to their tendency to generate hallucinatory responses. Uncertainty Quantification (UQ), the evaluation of model output reliability, is crucial for ensuring the safety and robustness of AI systems. Recent studies have concentrated on model uncertainty by analyzing the relationship between output entropy under various sampling conditions and the corresponding labels. However, these methods primarily focus on measuring model entropy with precision to capture response characteristics, often neglecting the uncertainties associated with greedy decoding results-the sources of model labels, which can lead to biased classification outcomes. In this paper, we explore the biases introduced by greedy decoding and propose a label-confidence-aware (LCA) uncertainty estimation based on Kullback-Leibler (KL) divergence bridging between samples and label source, thus enhancing the reliability and stability of uncertainty assessments. Our empirical evaluations across a range of popular LLMs and NLP datasets reveal that different label sources can indeed affect classification, and that our approach can effectively capture differences in sampling results and label sources, demonstrating more effective uncertainty estimation.
翻译:大型语言模型在生成任务中展现出强大能力,但由于其倾向于产生幻觉性响应,也带来了潜在风险。不确定性量化作为评估模型输出可靠性的方法,对于确保人工智能系统的安全性与鲁棒性至关重要。近期研究通过分析不同采样条件下输出熵与对应标签之间的关系,主要聚焦于模型不确定性。然而,这些方法侧重于精确测量模型熵以捕捉响应特征,往往忽略了与贪婪解码结果(即模型标签来源)相关的不确定性,这可能导致有偏差的分类结果。本文探究了贪婪解码引入的偏差,并提出一种基于样本与标签源间Kullback-Leibler散度桥接的标签置信感知不确定性估计方法,从而提升不确定性评估的可靠性与稳定性。我们在多种主流大型语言模型和自然语言处理数据集上的实证评估表明:不同标签源确实会影响分类结果,且我们的方法能有效捕捉采样结果与标签源之间的差异,展现出更有效的不确定性估计能力。