Inference-time computation has emerged as a promising scaling axis for improving large language model reasoning. However, despite yielding impressive performance, the optimal allocation of inference-time computation remains poorly understood. A central question is whether to prioritize sequential scaling (e.g., longer chains of thought) or parallel scaling (e.g., majority voting across multiple short chains of thought). In this work, we seek to illuminate the landscape of test-time scaling by demonstrating the existence of reasoning settings where sequential scaling offers an exponential advantage over parallel scaling. These settings are based on graph connectivity problems in challenging distributions of graphs. We validate our theoretical findings with comprehensive experiments across a range of language models, including models trained from scratch for graph connectivity with different chain of thought strategies as well as large reasoning models.
翻译:推理时计算已成为提升大型语言模型推理能力的重要扩展方向。然而,尽管取得了令人瞩目的性能表现,推理时计算的最优分配机制仍缺乏深入理解。一个核心问题在于:应当优先采用序列化扩展(例如更长的思维链)还是并行化扩展(例如基于多条短思维链的多数投票机制)。本研究通过论证在某些推理场景中序列化扩展能提供相对于并行化扩展的指数级优势,旨在揭示测试时扩展的全局图景。这些场景基于具有挑战性的图分布中的连通性问题。我们通过跨多种语言模型的系统性实验验证了理论发现,包括采用不同思维链策略从头训练用于图连通性任务的模型,以及现有的大型推理模型。