Generalization of Fine-Tuned Uncertainty Communication and Metacognition in Large Language Models

Background. Large language models are increasingly used in settings where confident but incorrect answers can mislead users. Reliable uncertainty communication requires a form of metacognition: monitoring when one's own answers are likely to be correct. Yet models' stated confidence is often poorly aligned with answer correctness. We test whether supervised fine-tuning improves uncertainty communication and whether gains transfer across domains and task formats. Methods. We fine-tuned two models on general knowledge, mathematics, and open-ended trivia questions. We evaluated single-question confidence estimation, in which the model reports numeric confidence for one answer, and pairwise confidence comparison, in which it chooses which of two questions it is more likely to answer correctly. We tested held-out questions from training domains and new medical, legal, and truthfulness benchmarks. We assessed calibration, discrimination, and answer accuracy before and after fine-tuning. Results. Here we show that fine-tuning improves alignment between stated confidence and observed accuracy and increases the model's ability to assign higher confidence to correct than to incorrect answers. Gains occur within training domains and, to a lesser extent, in new domains. However, single-task training does not reliably transfer between single-question confidence estimation and pairwise confidence comparison. Multitask fine-tuning produces broader gains in the models and tasks studied here. Conclusions. Uncertainty communication in large language models is trainable, but transfer across metacognitive tasks is limited. Joint training on multiple confidence tasks may support broader generalization, although further tests across model families and metacognitive tasks are needed.

翻译：暂无翻译

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

大型语言模型遇上文本属性图：一种融合框架与应用的综述

专知会员服务

10+阅读 · 2025年10月27日

大语言模型与小语言模型协同机制综述

专知会员服务

40+阅读 · 2025年5月15日

重磅！《大语言模型》新书出炉，人大出版，391页pdf

专知会员服务

201+阅读 · 2024年4月15日

《对齐语言模型的通用和可转移对抗性攻击》CMU等2023最新论文

专知会员服务

26+阅读 · 2024年1月2日