Large language model (LLM) scaling laws are empirical formulas that estimate changes in model quality as a result of increasing parameter count and training data. However, these formulas, including the popular DeepMind Chinchilla scaling laws, neglect to include the cost of inference. We modify the Chinchilla scaling laws to calculate the optimal LLM parameter count and pre-training data size to train and deploy a model of a given quality and inference demand. We conduct our analysis both in terms of a compute budget and real-world costs and find that LLM researchers expecting reasonably large inference demand (~1B requests) should train models smaller and longer than Chinchilla-optimal.
翻译:大型语言模型(LLM)缩放定律是通过增加参数量和训练数据来估算模型质量变化的经验公式。然而,这些公式(包括广为人知的DeepMind Chinchilla缩放定律)并未纳入推理成本的影响。我们改进了Chinchilla缩放定律,用于计算满足特定质量和推理需求时LLM的最优参数量与预训练数据规模。我们基于计算预算和实际成本展开分析,发现对于预期推理请求量较大(约10亿次)的LLM研究者而言,应训练比Chinchilla最优配置更小规模但更长时间的模型。