Language models (LMs) have demonstrated remarkable capabilities across a wide range of tasks in various domains. Despite their impressive performance, the reliability of their output is concerning and questionable regarding the demand for AI safety. Assessing the confidence of LM predictions and calibrating them across different tasks with the aim to align LM confidence with accuracy can help mitigate risks and enable LMs to make better decisions. There have been various works in this respect, but there has been no comprehensive overview of this important research area. The present survey aims to bridge this gap. In particular, we discuss methods and techniques for LM confidence estimation and calibration, encompassing different LMs and various tasks. We further outline the challenges of estimating the confidence for large language models and we suggest some promising directions for future work.
翻译:语言模型(LM)在多个领域的各类任务中展现出卓越的能力。尽管其性能令人印象深刻,但面向人工智能安全需求时,其输出结果的可靠性仍令人担忧且存疑。评估LM预测的置信度,并针对不同任务对其进行校准——旨在使LM置信度与准确率对齐——有助于降低风险,并使LM做出更优决策。已有诸多相关工作,但该重要研究领域尚缺乏全面综述。本文旨在弥合这一空白。具体而言,我们探讨了针对LM置信度估计与校准的方法与技术,涵盖了不同LM类型及多种任务。同时,我们进一步阐述了大型语言模型置信度估计面临的挑战,并为未来研究提出了若干富有前景的方向。