Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompasses a variety of approaches, with two major types being particularly prominent: information-based, which focus on model confidence expressed as token probabilities, and consistency-based, which assess the semantic relationship between multiple outputs generated using repeated sampling. Several recent methods have combined these two approaches and shown impressive performance in various applications. However, they sometimes fail to outperform much simpler baseline methods. Our investigation reveals distinctive characteristics of LLMs as probabilistic models, which help to explain why these UQ methods underperform in certain tasks. Based on these findings, we propose a new way of synthesizing model confidence and output consistency that leads to a family of efficient and robust UQ methods. We evaluate our approach across a variety of tasks such as question answering, abstractive summarization, and machine translation, demonstrating sizable improvements over state-of-the-art UQ approaches.
翻译:大语言模型(LLM)的不确定性量化(UQ)方法包含多种类型,其中两类尤为突出:基于信息的方法(关注以词元概率表示的模型置信度)和基于一致性的方法(评估通过重复采样生成的多个输出之间的语义关系)。近期若干方法将这两种思路相结合,并在多种应用中展现出优异性能。然而,它们有时无法超越更为简单的基线方法。我们的研究发现,大语言模型作为概率模型具有独特性质,这有助于解释为何这些UQ方法在某些任务中表现欠佳。基于这些发现,我们提出一种综合模型置信度与输出一致性的新范式,由此衍生出一系列高效稳健的UQ方法。我们在问答、抽象摘要及机器翻译等多种任务上评估所提方法,结果表明其相较现有最优UQ方法取得了显著提升。