We present a training-free method for detecting valid mathematical reasoning in large language models through spectral analysis of attention patterns. By treating attention matrices as adjacency matrices of dynamic graphs over tokens, we extract four interpretable spectral diagnostics, the Fiedler value (algebraic connectivity), high-frequency energy ratio (HFER), graph signal smoothness, and spectral entropy, that exhibit statistically significant differences between valid and invalid mathematical proofs. Experiments across seven transformer models from four independent architectural families (Meta Llama, Alibaba Qwen, Microsoft Phi, and Mistral AI) demonstrate that this spectral signature produces effect sizes up to Cohen's $d = 3.30$ ($p < 10^{-116}$), enabling 85.0--95.6\% classification accuracy under rigorous evaluation, with calibrated thresholds reaching 93--95\% on the full dataset. The method requires no training data, fine-tuning, or learned classifiers: a single threshold on a spectral metric suffices for high accuracy. Through systematic label correction, we discover that the spectral method detects logical coherence rather than compiler acceptance, identifying mathematically valid proofs that formal verifiers reject due to technical failures. We further identify an architectural dependency: Mistral-7B's Sliding Window Attention shifts the discriminative signal from HFER to late-layer Smoothness ($d = 2.09$, $p_{\text{MW}} = 1.16 \times 10^{-48}$), revealing that attention mechanism design affects which spectral features capture reasoning validity. These findings establish spectral graph analysis as a principled framework for reasoning verification with immediate applications to hallucination detection and AI safety monitoring.
翻译:我们提出一种无需训练的方法,通过注意力模式的谱分析来检测大型语言模型中的有效数学推理。通过将注意力矩阵视为词元上动态图的邻接矩阵,我们提取了四个可解释的谱诊断指标:费德勒值(代数连通性)、高频能量比、图信号平滑度与谱熵。这些指标在有效与无效数学证明之间展现出统计学上的显著差异。在来自四个独立架构家族(Meta Llama、阿里巴巴Qwen、微软Phi 与 Mistral AI)的七个Transformer模型上的实验表明,该谱特征产生的效应量高达Cohen's $d = 3.30$($p < 10^{-116}$),在严格评估下实现了85.0%至95.6%的分类准确率,经校准的阈值在整个数据集上达到93%至95%。该方法无需训练数据、微调或学习分类器:仅需对谱指标设置单一阈值即可实现高准确率。通过系统性标签校正,我们发现谱方法检测的是逻辑连贯性而非编译器接受度,能够识别出因技术故障而被形式化验证器拒绝但在数学上有效的证明。我们进一步识别出一种架构依赖性:Mistral-7B的滑动窗口注意力机制将判别信号从高频能量比转移至深层平滑度($d = 2.09$,$p_{\text{MW}} = 1.16 \times 10^{-48}$),这表明注意力机制的设计影响了哪些谱特征能够捕捉推理有效性。这些发现确立了谱图分析作为推理验证的一个原则性框架,在幻觉检测与AI安全监控方面具有直接应用价值。