Chain-of-thought (CoT) reasoning and its variants have substantially improved the performance of language models on complex reasoning tasks, yet the precise mechanisms by which different strategies facilitate generalization remain poorly understood. While current explanations often point to increased test-time computation or structural guidance, establishing a consistent, quantifiable link between these factors and generalization remains challenging. In this work, we identify intrinsic dimensionality as a quantitative measure for characterizing the effectiveness of reasoning chains. Intrinsic dimensionality quantifies the minimum number of model dimensions needed to reach a given accuracy threshold on a given task. By keeping the model architecture fixed and varying the task formulation through different reasoning strategies, we demonstrate that effective reasoning strategies consistently reduce the intrinsic dimensionality of the task. Validating this on GSM8K with Gemma-3 1B and 4B, we observe a strong inverse correlation between the intrinsic dimensionality of a reasoning strategy and its generalization performance on both in-distribution and out-of-distribution data. Our findings suggest that effective reasoning chains facilitate learning by better compressing the task using fewer parameters, offering a new quantitative metric for analyzing reasoning processes.
翻译:思维链推理及其变体显著提升了语言模型在复杂推理任务上的性能,然而不同策略促进泛化的确切机制仍不甚明了。尽管现有解释常指向测试时计算量的增加或结构引导,但在这些因素与泛化能力之间建立一致且可量化的联系仍具挑战。本工作中,我们提出将内在维度作为量化表征推理链有效性的指标。内在维度量化了在给定任务上达到特定准确率阈值所需的最小模型维度数。通过固定模型架构、仅通过不同推理策略改变任务表述,我们证明有效的推理策略能持续降低任务的内在维度。在GSM8K数据集上使用Gemma-3 1B和4B模型进行验证,我们观察到推理策略的内在维度与其在分布内和分布外数据上的泛化性能呈强负相关。我们的研究结果表明,有效推理链通过以更少参数实现任务信息的更好压缩来促进学习,这为分析推理过程提供了一种新的量化度量标准。