We introduce BSDetector, a method for detecting bad and speculative answers from a pretrained Large Language Model by estimating a numeric confidence score for any output it generated. Our uncertainty quantification technique works for any LLM accessible only via a black-box API, whose training data remains unknown. By expending a bit of extra computation, users of any LLM API can now get the same response as they would ordinarily, as well as a confidence estimate that cautions when not to trust this response. Experiments on both closed and open-form Question-Answer benchmarks reveal that BSDetector more accurately identifies incorrect LLM responses than alternative uncertainty estimation procedures (for both GPT-3 and ChatGPT). By sampling multiple responses from the LLM and considering the one with the highest confidence score, we can additionally obtain more accurate responses from the same LLM, without any extra training steps. In applications involving automated evaluation with LLMs, accounting for our confidence scores leads to more reliable evaluation in both human-in-the-loop and fully-automated settings (across both GPT 3.5 and 4).
翻译:摘要:我们提出BSDetector方法,通过为预训练大语言模型生成的任意输出估算数值置信度分数,来检测其低质量和推测性答案。我们的不确定性量化技术适用于仅通过黑盒API访问且训练数据未知的任何大语言模型。通过额外计算,大语言模型API用户现在可以获得与常规相同的响应,以及一个用于警示何时不应信任该响应的置信度估计。在封闭式和开放式问答基准上的实验表明,BSDetector比替代不确定性估计方法(针对GPT-3和ChatGPT)更准确地识别出错误的大语言模型响应。通过从大语言模型中采样多个响应并选择置信度最高的响应,我们还能在不进行额外训练步骤的情况下,从同一大语言模型中获得更准确的答案。在涉及大语言模型的自动化评估应用中,考虑我们的置信度分数可在人工参与和全自动设置(涉及GPT 3.5和4)中实现更可靠的评估。