Large language models (LLMs) offer broad utility but remain prone to hallucination and out-of-distribution (OOD) errors. We propose EigenTrack, an interpretable real-time detector that uses the spectral geometry of hidden activations, a compact global signature of model dynamics. By streaming covariance-spectrum statistics such as entropy, eigenvalue gaps, and KL divergence from random baselines into a lightweight recurrent classifier, EigenTrack tracks temporal shifts in representation structure that signal hallucination and OOD drift before surface errors appear. Unlike black- and grey-box methods, it needs only a single forward pass without resampling. Unlike existing white-box detectors, it preserves temporal context, aggregates global signals, and offers interpretable accuracy-latency trade-offs.
翻译:大语言模型(LLMs)虽具备广泛实用性,但仍易产生幻觉及分布外(OOD)错误。本文提出EigenTrack——一种可解释的实时检测器,其利用隐藏激活的谱几何结构(模型动态的紧凑全局特征),通过将熵、特征值间隙、随机基线KL散度等协方差谱统计量输入轻量级循环分类器,在表层错误显现前追踪表征结构中预示幻觉与OOD漂移的时序偏移。相较于黑盒与灰盒方法,本方案仅需单次前向传播且无需重采样;与现有白盒检测器相比,其能保持时序上下文、聚合全局信号,并提供可解释的准确率-延迟权衡。