Compute scaling for LLM reasoning trades off exploring solution approaches (\emph{breadth}) against refining promising ones (\emph{depth}), yet why a given trade-off works, and why it often fails to transfer across models, remains unclear. We argue that \textbf{the optimal strategy depends on the model's \emph{diversity profile}, the spread of probability mass across solution approaches, and that this must be characterized before any exploration strategy is adopted.} We formalize this with a framework decomposing reasoning uncertainty, deriving when depth-based refinement outperforms parallel sampling, and validate it across three model families at both inference and training. Our central finding is that the diversity regime dictates the strategy: low-diversity aligned models benefit from depth-based refinement with lightweight intrinsic signals, whereas high-diversity base models are often harmed by it, and instead need breadth or stronger signals to compensate.
翻译:大语言模型推理的计算扩展需要在探索解决方法(广度)和优化有前景的方案(深度)之间进行权衡,然而,特定权衡为何有效,以及为何常常无法跨模型迁移,迄今尚不明确。我们认为,**最优策略取决于模型的“多样性特征”,即概率质量在解决方法上的分布情况,且必须在采用任何探索策略之前对此进行刻画。** 我们通过一个分解推理不确定性的框架对此进行形式化建模,推导出深度优先的优化何时优于并行采样,并在三个模型家族的推理和训练阶段进行了验证。我们的核心发现是:多样性机制决定了策略——低多样性的对齐模型受益于基于轻量级内在信号的深度优化,而高多样性的基础模型往往因此受损,反而需要广度或更强信号来补偿。