LLMs' linguistically expressed confidence should faithfully reflect their intrinsic uncertainty. While recent work shows LLMs struggle to use epistemic markers (e.g., "it is likely...") in a human-aligned fashion, it remains unclear whether models can apply their own linguistic confidence framework to associate markers with specific confidence levels in a stable and generalizable way, and how contextual features impact this ability. We conduct the first systematic study of this question, formalizing _marker internal confidence_ (MIC) as the estimated intrinsic confidence a model associates with a specific epistemic marker in a given task domain. We present 7 metrics to evaluate the stability of MICs within and across distributions. Applying our analysis framework to diverse models and tasks, we find that LLMs remain faithfully miscalibrated even under model-centric interpretation of marker meanings, struggling to differentiate markers by internal confidence across distributions despite preserving a somewhat consistent ranking order across tasks. This supplies critical, complementary evidence to existing work toward a holistic understanding of faithful calibration in LLMs, emphasizing the need for more aligned and stable marker use to improve trustworthiness and reliability.
翻译:大语言模型(LLMs)在语言中表达的置信度应忠实反映其内在不确定性。尽管近期研究表明,LLMs在模仿人类认知模式使用认知标记(如"it is likely...")方面存在困难,但尚不清楚模型能否利用自身语言置信度框架,以稳定且泛化的方式将标记与特定置信水平相关联,以及上下文特征如何影响这种能力。我们首次系统研究了这一问题,将"标记内在置信度"(MIC)形式化为模型在特定任务域中与某个认知标记关联的估计内在置信度。我们提出了7项指标用以评估MIC在分布内及跨分布间的稳定性。将该分析框架应用于多种模型与任务后,我们发现:即使从模型中心主义视角解读标记含义,LLMs仍存在忠实性校准偏差——在跨分布环境中无法通过内在置信度区分不同标记,尽管在任务间保持了较为一致的排序阶次。这一发现为现有研究提供了关键性补充证据,有助于全面理解LLMs的忠实校准问题,并强调需要开发更对齐且稳定的标记使用方式以提升可信度与可靠性。