A Survey of Calibration Process for Black-Box LLMs

Large Language Models (LLMs) demonstrate remarkable performance in semantic understanding and generation, yet accurately assessing their output reliability remains a significant challenge. While numerous studies have explored calibration techniques, they primarily focus on White-Box LLMs with accessible parameters. Black-Box LLMs, despite their superior performance, pose heightened requirements for calibration techniques due to their API-only interaction constraints. Although recent researches have achieved breakthroughs in black-box LLMs calibration, a systematic survey of these methodologies is still lacking. To bridge this gap, we presents the first comprehensive survey on calibration techniques for black-box LLMs. We first define the Calibration Process of LLMs as comprising two interrelated key steps: Confidence Estimation and Calibration. Second, we conduct a systematic review of applicable methods within black-box settings, and provide insights on the unique challenges and connections in implementing these key steps. Furthermore, we explore typical applications of Calibration Process in black-box LLMs and outline promising future research directions, providing new perspectives for enhancing reliability and human-machine alignment. This is our GitHub link: https://github.com/LiangruXie/Calibration-Process-in-Black-Box-LLMs

翻译：大语言模型在语义理解与生成方面展现出卓越性能，但其输出可靠性的准确评估仍面临重大挑战。尽管已有大量研究探索校准技术，这些研究主要聚焦于参数可访问的白盒大语言模型。黑盒大语言模型虽具备更优性能，但由于其仅能通过API交互的限制，对校准技术提出了更高要求。尽管近期研究已在黑盒大语言模型校准方面取得突破，但针对这些方法的系统性综述仍然缺乏。为填补这一空白，本文首次对黑盒大语言模型的校准技术进行全面综述。首先，我们将大语言模型的校准流程定义为包含两个相互关联的关键步骤：置信度估计与校准。其次，我们系统梳理了黑盒场景下的适用方法，并就实施这些关键步骤时面临的独特挑战与内在联系提供见解。此外，我们探讨了校准流程在黑盒大语言模型中的典型应用场景，并展望了未来有前景的研究方向，为提升模型可靠性与人机对齐提供了新的视角。本项目GitHub链接：https://github.com/LiangruXie/Calibration-Process-in-Black-Box-LLMs

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日