For large language models (LLMs) to be trusted by humans they need to be well-calibrated in the sense that they can accurately assess and communicate how likely it is that their predictions are correct. Recent work has focused on the quality of internal LLM confidence assessments, but the question remains of how well LLMs can communicate this internal model confidence to human users. This paper explores the disparity between external human confidence in an LLM's responses and the internal confidence of the model. Through experiments involving multiple-choice questions, we systematically examine human users' ability to discern the reliability of LLM outputs. Our study focuses on two key areas: (1) assessing users' perception of true LLM confidence and (2) investigating the impact of tailored explanations on this perception. The research highlights that default explanations from LLMs often lead to user overestimation of both the model's confidence and its' accuracy. By modifying the explanations to more accurately reflect the LLM's internal confidence, we observe a significant shift in user perception, aligning it more closely with the model's actual confidence levels. This adjustment in explanatory approach demonstrates potential for enhancing user trust and accuracy in assessing LLM outputs. The findings underscore the importance of transparent communication of confidence levels in LLMs, particularly in high-stakes applications where understanding the reliability of AI-generated information is essential.
翻译:为使大语言模型(LLMs)获得人类信任,其需具备良好的校准能力,即准确评估并传达自身预测正确的可能性。现有研究聚焦于LLMs内部置信度评估的质量,但模型能否向人类用户有效传达这种内部置信度仍是未解问题。本文探讨了人类对LLM响应结果的置信度与模型内部置信度之间的差异。通过多项选择题实验,我们系统研究了人类用户辨别LLM输出可靠性的能力。研究聚焦两个关键领域:(1)评估用户对LLM真实置信度的感知;(2)研究定制化解释对此感知的影响。结果表明,LLMs提供的默认解释常导致用户高估模型的置信度及准确性。通过修改解释以更精确地反映LLM的内部置信度,我们观察到用户感知发生显著变化,使其与模型实际置信水平更为接近。这种解释方式的调整展示了增强用户信任与评估LLM输出准确性的潜力。研究发现凸显了在LLMs中透明传达置信度的重要性,尤其在高风险应用中,理解AI生成信息的可靠性至关重要。