Evaluating LLM reliability via scalar probabilities often fails to capture the structural dynamics of reasoning. We introduce TRACED, a framework that assesses reasoning quality through theoretically grounded geometric kinematics. By decomposing reasoning traces into Progress (displacement) and Stability (curvature), we reveal a distinct topological divergence: correct reasoning manifests as high-progress, stable trajectories, whereas hallucinations are characterized by low-progress, unstable patterns (stalled displacement with high curvature fluctuations). Leveraging these signatures, our probabilistic framework achieves competitive performance and superior robustness across diverse benchmarks. Crucially, TRACED bridges geometry and cognition by mapping high curvature to ''Hesitation Loops'' and displacement to ''Certainty Accumulation'', offering a physical lens to decode the internal dynamics of machine thought.
翻译:通过标量概率评估LLM的可靠性常常难以捕捉推理的结构性动态。我们提出TRACED框架,该框架通过理论基础的几何运动学来评估推理质量。通过将推理轨迹分解为进展(位移)与稳定性(曲率),我们揭示出显著的拓扑分叉:正确推理表现为高进展、稳定的轨迹,而幻觉则表现为低进展、不稳定的模式(停滞的位移伴随高曲率波动)。利用这些特征,我们的概率框架在多个基准测试中实现了具有竞争力的性能和卓越的鲁棒性。关键在于,TRACED通过将高曲率映射为"犹豫环"、将位移映射为"确定性累积"来连接几何与认知,为解码机器思维的内部动态提供了物理视角。