Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompasses a variety of approaches, with two major types being particularly prominent: information-based, which focus on model confidence expressed as token probabilities, and consistency-based, which assess the semantic relationship between multiple outputs generated using repeated sampling. Several recent methods have combined these two approaches and shown impressive performance in various applications. However, they sometimes fail to outperform much simpler baseline methods. Our investigation reveals distinctive characteristics of LLMs as probabilistic models, which help to explain why these UQ methods underperform in certain tasks. Based on these findings, we propose a new way of synthesizing model confidence and output consistency that leads to a family of efficient and robust UQ methods. We evaluate our approach across a variety of tasks such as question answering, abstractive summarization, and machine translation, demonstrating sizable improvements over state-of-the-art UQ approaches.
翻译:大语言模型的不确定性量化方法涵盖多种途径,其中两类尤为突出:基于信息的方法(关注以词元概率表示的模型置信度)和基于一致性的方法(通过重复采样评估多个输出间的语义关联)。近期若干方法融合了这两类途径,并在多种应用中展现出优异性能。然而,它们有时难以超越更为简单的基线方法。我们的研究揭示了大语言模型作为概率模型的独特性质,这有助于解释为何这些不确定性量化方法在某些任务中表现欠佳。基于这些发现,我们提出了一种综合模型置信度与输出一致性的新范式,由此衍生出一系列高效稳健的不确定性量化方法。我们在问答、抽象摘要及机器翻译等多种任务上评估了该方法,结果表明其相较现有最优不确定性量化方法取得了显著提升。