Calibration is commonly evaluated by comparing model confidence with its empirical correctness, implicitly treating reliability as a function of the confidence score alone. However, this view can hide substantial structure: models may be systematically overconfident on some kinds of inputs and underconfident on others, causing global reliability diagnostics to obscure localised calibration failures. To address this, we formulate the problem of discovering hidden miscalibration regimes without assuming access to predefined data slices. We define the corresponding miscalibration field and propose a diagnostic framework for estimating it. Our approach learns a calibration-aware representation of the input space and estimates signed local miscalibration by kernel smoothing in the learned geometry. Across four real-world LLM benchmarks and twelve LLMs, we find that input-dependent calibration heterogeneity is prevalent. We further show that the discovered fields are actionable: they support local confidence correction and reduce calibration error in systematically miscalibrated regions where confidence-based methods such as isotonic regression and temperature scaling are less effective.
翻译:校准通常通过比较模型置信度与其经验正确性来评估,隐含地将可靠性视为仅与置信度分数相关的函数。然而,这种观点可能掩盖重要结构:模型可能对某些输入系统性地过度自信,而对其他输入则信心不足,导致全局可靠性诊断模糊了局部校准失效。为解决此问题,我们形式化了在无预定义数据切片访问条件下发现隐藏误校准模式的问题。我们定义了相应的误校准场,并提出了用于估计该场的诊断框架。我们的方法学习输入空间的校准感知表示,并通过学习几何中的核平滑估计带符号的局部误校准。在四个真实世界大语言模型基准测试和十二个大语言模型上,我们发现输入依赖的校准异质性普遍存在。我们进一步证明发现的场是可操作的:它们支持局部置信度校正,并在系统误校准区域(如等渗回归和温度缩放等基于置信度的方法效果较差的区域)减少校准误差。