The Uniform Information Density (UID) hypothesis suggests that effective communication maintains a stable flow of information. In this work, we revisit this principle in the context of large language model (LLM) reasoning traces, asking whether step-level uniformity reflects reasoning quality. To this end, we propose an entropy-based stepwise information density metric and introduce two complementary measures of uniformity, local and global uniformity scores. Across the experiments on six different reasoning benchmarks, we find that step-level uniformity not only provides a strong theoretical lens but also yields practical performance benefits; for example, selecting reasoning traces with more uniform information density at the step-level improves accuracy by 10-32\% relative gains over baselines at AIME2025. Our analysis further reveals that correct reasoning traces tend to avoid sharp information density spikes, while incorrect traces exhibit irregular information bursts. These results demonstrate that UID-inspired information density measures outperform alternative internal signals as predictors of reasoning quality. Results highlight the uniformity of the information density as a robust diagnostic and selection criterion for building more reliable and accurate reasoning systems.
翻译:均匀信息密度假说认为,有效的沟通应保持信息流的稳定性。本研究在大型语言模型推理轨迹的背景下重访这一原则,探究步骤层面的均匀性是否反映推理质量。为此,我们提出一种基于熵的逐步信息密度度量方法,并引入局部与全局均匀性评分这两种互补的均匀性衡量指标。在六个不同推理基准上的实验表明,步骤层面的均匀性不仅提供了强有力的理论视角,还带来了实际的性能提升;例如,在AIME2025基准上,选择步骤层面信息密度更均匀的推理轨迹,其准确率相对于基线获得了10-32%的相对提升。我们的分析进一步揭示,正确的推理轨迹倾向于避免急剧的信息密度峰值,而错误的轨迹则表现出不规则的信息爆发。这些结果表明,受UID启发的信息密度度量在预测推理质量方面优于其他内部信号。研究结果凸显了信息密度的均匀性可作为构建更可靠、更准确推理系统的稳健诊断与选择标准。