Large reasoning models (LRMs) have heterogeneous inference energy costs based on which model is used and how much it reasons. To reduce energy, it is important to choose the right LRM and operate it in the right way. As a result, the performance of systems that dispatch tasks to different individual LRMs depend on the balance between mean energy provisioning and stochastic fluctuations. The critical regime is the unique operating point at which neither auxiliary energy nor baseline energy is systematically wasted. Increasing baseline supply shifts the system toward persistent over-supply and baseline-energy waste, while reducing supply induces persistent reliance on auxiliary energy. Yet in this regime, performance remains volatility-limited and so a second-order characterization provides further insights that we develop. Here, performance is governed by how variability is absorbed across time, models, and execution choices. This perspective highlights variance-aware routing and dispatch as a principled design axis, and provides a theoretical basis for developing energy-aware model routing policies. Routing behavior is characterized when dispatch policies are based on training-compute and inference-compute scaling laws for LRMs.
翻译:大型推理模型(LRMs)因所用模型及其推理程度的差异,具有异构的推理能耗成本。为降低能耗,需选择正确的LRM并以恰当方式运行。因此,将任务分派至不同个体LRM的系统性能取决于平均能量供给与随机波动之间的平衡。临界状态是既不系统性浪费辅助能量也不系统性浪费基准能量的唯一工作点。提升基准供给量会使系统转向持续过供与基准能量浪费,而降低供给量则引发对辅助能量的持续依赖。然而在此状态下,性能仍受波动性限制,因此我们发展了通过二阶特性分析获得的进一步洞察。此时,性能由能量变异性如何在时间维度、模型维度及执行选择间被吸收所主导。此视角揭示了面向方差感知的路由与分派策略作为一项原则性设计维度,并为开发能量感知模型路由策略提供了理论基础。当分派策略基于LRM的训练计算与推理计算缩放定律时,路由行为将被表征。