Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty

As natural language becomes the default interface for human-AI interaction, there is a need for LMs to appropriately communicate uncertainties in downstream applications. In this work, we investigate how LMs incorporate confidence in responses via natural language and how downstream users behave in response to LM-articulated uncertainties. We examine publicly deployed models and find that LMs are reluctant to express uncertainties when answering questions even when they produce incorrect responses. LMs can be explicitly prompted to express confidences, but tend to be overconfident, resulting in high error rates (an average of 47%) among confident responses. We test the risks of LM overconfidence by conducting human experiments and show that users rely heavily on LM generations, whether or not they are marked by certainty. Lastly, we investigate the preference-annotated datasets used in post training alignment and find that humans are biased against texts with uncertainty. Our work highlights new safety harms facing human-LM interactions and proposes design recommendations and mitigating strategies moving forward.

翻译：随着自然语言成为人机交互的默认界面，语言模型需要在下游应用中恰当地传达不确定性。本研究探讨了语言模型如何通过自然语言在回答中融入置信度，以及下游用户如何响应语言模型所表达的不确定性。我们对公开部署的模型进行检验，发现即使语言模型生成错误回答时，也倾向于避免表达不确定性。虽然可以通过显式提示要求语言模型表达置信度，但这些模型往往表现出过度自信，导致其自信回答的错误率居高不下（平均达47%）。通过开展人类实验，我们测试了语言模型过度自信的风险，结果显示无论回答是否标注确定性，用户都高度依赖语言模型生成的内容。最后，我们分析了用于训练后对齐的偏好标注数据集，发现人类对包含不确定性的文本存在偏见。本研究揭示了人机交互面临的新型安全风险，并提出了设计建议与缓解策略。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/