To maintain user trust, large language models (LLMs) should signal low confidence on examples where they are incorrect, instead of misleading the user. The standard approach of estimating confidence is to use the softmax probabilities of these models, but as of November 2023, state-of-the-art LLMs such as GPT-4 and Claude-v1.3 do not provide access to these probabilities. We first study eliciting confidence linguistically -- asking an LLM for its confidence in its answer -- which performs reasonably (80.5% AUC on GPT-4 averaged across 12 question-answering datasets -- 7% above a random baseline) but leaves room for improvement. We then explore using a surrogate confidence model -- using a model where we do have probabilities to evaluate the original model's confidence in a given question. Surprisingly, even though these probabilities come from a different and often weaker model, this method leads to higher AUC than linguistic confidences on 9 out of 12 datasets. Our best method composing linguistic confidences and surrogate model probabilities gives state-of-the-art confidence estimates on all 12 datasets (84.6% average AUC on GPT-4).
翻译:摘要:为维护用户信任,大型语言模型(LLMs)应在自身输出错误时明确传达低置信度,而非误导用户。置信度估计的标准方法是使用这些模型的softmax概率,但截至2023年11月,GPT-4和Claude-v1.3等最先进的LLMs并未提供访问这些概率的接口。我们首先研究通过语言方式询问LLM对其答案的置信度——该方法表现尚可(在12个问答数据集上对GPT-4的平均AUC为80.5%,比随机基线高7%),但仍存在提升空间。随后我们探索使用代理置信度模型——借助能够获取概率的模型来评估原始模型对给定问题的置信度。令人惊讶的是,尽管这些概率来自不同且通常更弱的模型,但该方法在12个数据集中的9个上取得了比语言置信度更高的AUC。我们提出的组合语言置信度与代理模型概率的最优方法,在所有12个数据集上实现了最先进的置信度估计(对GPT-4的平均AUC为84.6%)。