The rise of large language models (LLMs) and their tight integration into our daily life make it essential to dedicate efforts towards their trustworthiness. Uncertainty quantification for LLMs can establish more human trust into their responses, but also allows LLM agents to make more informed decisions based on each other's uncertainty. To estimate the uncertainty in a response, internal token logits, task-specific proxy models, or sampling of multiple responses are commonly used. This work focuses on asking the LLM itself to verbalize its uncertainty with a confidence score as part of its output tokens, which is a promising way for prompt- and model-agnostic uncertainty quantification with low overhead. Using an extensive benchmark, we assess the reliability of verbalized confidence scores with respect to different datasets, models, and prompt methods. Our results reveal that the reliability of these scores strongly depends on how the model is asked, but also that it is possible to extract well-calibrated confidence scores with certain prompt methods. We argue that verbalized confidence scores can become a simple but effective and versatile uncertainty quantification method in the future. Our code is available at https://github.com/danielyxyang/llm-verbalized-uq .
翻译:随着大语言模型(LLMs)的兴起及其与日常生活的深度融合,致力于提升其可信度变得至关重要。对大语言模型进行不确定性量化不仅能增强人类对其回答的信任,还能使LLM智能体基于彼此的不确定性做出更明智的决策。为估计回答中的不确定性,通常采用内部词元逻辑值、任务特定代理模型或多响应采样等方法。本研究聚焦于直接要求大语言模型通过输出词元口头化表达其不确定性(即附带置信度评分),这是一种具有低开销、且与提示方法和模型无关的、前景广阔的不确定性量化途径。通过广泛的基准测试,我们评估了口头化置信度评分在不同数据集、模型及提示方法下的可靠性。研究结果表明,这些评分的可靠性在很大程度上取决于询问模型的方式,但通过特定提示方法确实能够提取出校准良好的置信度评分。我们认为,口头化置信度评分有望成为未来一种简单、有效且通用的不确定性量化方法。我们的代码发布于 https://github.com/danielyxyang/llm-verbalized-uq。