A Theory of Training Profit-Optimal LLMs

Scaling LLMs requires tremendous computational resources, and recent advances in AI have gone hand in hand with massive amounts of capital expenditure. While it is established that scaling up LLMs reliably increases model quality (quantified in terms of loss or downstream evaluations), it is unclear how these quality improvements translate to potential revenue, and whether revenue increases would offset costs of larger-scale training and inference. In this work, we develop an economic model for characterizing the rational behavior of an LLM training firm by combining scaling laws with microeconomic theory. Under our model of firm behavior, LLM quality can be increased with more parameters and training tokens, leading to more potential adoption by consumers, who each have a quality threshold for using the LLM. On the other hand, additional parameters and training tokens both incur additional costs. We analyze the profit maximization problem for this model under compute-bound and data-bound regimes. In the compute-bound regime, optimal model size and token budget track hardware efficiency $E$ (FLOPs/\$) at a near-linear rate; total training cost then scales sub-quadratically in $E$. Data efficiency improvements incentivize larger models and training expenditure. When we are limited to $D$ data, profit-optimal training expenditure scales as $D^2/E$, i.e, increase with data and decreases with hardware efficiency (as well as data efficiency). Finally, we analyze practical trends in training expenditure: current trends are consistent with our most permissive model variants in the compute-bound regime, but are not profit-optimal in the data-bound regime or assuming hardware advances will stall. Overall, our results provide a theory of profit-optimal LLM training, providing a foundation for engaging critically with industry statements and supporting long-term economic decision making.

翻译：扩展大语言模型（LLM）需要巨大的计算资源，而人工智能的最新进展与大规模资本支出密不可分。虽然学界已确认扩展LLM能可靠提升模型质量（以损失函数或下游评估量化），但尚不明确这些质量提升如何转化为潜在收益，以及收入增长能否抵消更大规模训练与推理的成本。本研究通过结合缩放定律与微观经济学理论，构建了描述LLM训练企业理性行为的经济模型。在该企业行为模型中，LLM质量可随参数和训练词元数量的增加而提升，从而吸引更多消费者采用——每位消费者对使用LLM均设有最低质量阈值。然而，额外参数与训练词元也会导致成本增加。我们分析了在计算受限和数据受限两种场景下的利润最大化问题。在计算受限场景中，最优模型规模与词元预算以近线性速率随硬件效率$E$（每秒浮点运算次数/美元）增长；此时总训练成本随$E$呈亚二次方比例扩展。数据效率提升会激励更大的模型和训练支出。当数据量限制为$D$时，利润最优训练支出随$D^2/E$变化，即随数据量增加而增长，随硬件效率（及数据效率）提升而下降。最后，我们分析了实际训练支出趋势：当前趋势与计算受限场景中最宽松的模型变体一致，但在数据受限场景或假设硬件进步停滞时并非利润最优。总体而言，本研究建立了利润最优LLM训练的理论体系，为批判性审视行业论断提供了理论基础，并支持长期经济决策。