Large language models hallucinate confidently, making uncertainty quantification (UQ) essential for reliable deployment. Existing methods rely predominantly on token-level signals, leaving the geometric structure of intermediate hidden states underused. In this paper, we take the geometric complexity of hidden-state matrices as a measure of the global uncertainty of LLMs, while treating token-level uncertainty estimation as a local metric. We show that hidden-state geometric entropy (global uncertainty) and token-level entropy (local uncertainty) are statistically near-orthogonal, capturing distinct failure regimes for reliability prediction. In particular, global geometry recovers the confident-but-wrong failure mode that local signals systematically miss. Building on this, we propose Global-Local Uncertainty (GLU), an unsupervised, single-pass score that fuses the two signals via a multiplicative gate. Across three model families and six benchmarks, GLU matches or outperforms all unsupervised baselines while requiring only a single forward pass and remaining length-normalized and architecture-agnostic.
翻译:大语言模型会自信地产生幻觉,这使得不确定性量化(UQ)对于其可靠部署至关重要。现有方法主要依赖词元级信号,而忽略了中间隐藏状态的几何结构。本文中,我们将隐藏状态矩阵的几何复杂度作为衡量大语言模型全局不确定性的指标,同时将词元级不确定性估计视为局部度量。我们证明隐藏状态几何熵(全局不确定性)和词元级熵(局部不确定性)在统计上近似正交,能捕捉到可靠性预测中不同的失败模式。特别是,全局几何结构能恢复局部信号系统性地遗漏的“自信但错误”失败模式。基于此,我们提出了全局-局部不确定性(GLU),这是一种无监督、单次前向传播的分数,通过乘法门融合上述两种信号。在三个模型家族和六个基准测试中,GLU在仅需单次前向传播且保持长度归一化和架构无关性的前提下,达到或超越了所有无监督基线方法。