Scaling laws relate model quality to compute budget (FLOPs), but practitioners face wall-clock time constraints, not compute budgets. We study optimal model sizing under fixed time budgets from 5 minutes to 24 hours on consumer GPUs (RTX 4090). Across 70+ runs spanning 50M--1031M parameters, we find: (1)~at each time budget a U-shaped curve emerges where too-small models overfit and too-large models undertrain; (2)~optimal model size follows $N^* \propto t^{0.60}$, growing \emph{faster} than Chinchilla's $N^* \propto C^{0.50}$, with $α= 0.60 \pm 0.07$ robustly exceeding compute-optimal across all sensitivity analyses; (3)~a \emph{dual U-shape mechanism}: short-budget U-curves arise from compute bottlenecks, while long-budget U-curves emerge from data bottlenecks (overfitting), with an intermediate regime where the U-curve temporarily disappears. These findings have immediate implications for researchers training on consumer hardware, where wall-clock time -- not FLOPs -- is the binding constraint. We release all code, logs, and 70+ experimental configurations.
翻译:[translated abstract in Chinese]
缩放定律将模型质量与计算预算(FLOPs)相关联,但实践者面临的是墙上时钟时间约束而非计算预算。我们研究了在消费级GPU(RTX 4090)上固定时间预算(5分钟至24小时)下的最优模型规模。在跨越5千万至10.31亿参数的70余次运行中,我们发现:(1)~在每个时间预算下均出现U形曲线,模型过小会导致过拟合,过大则导致欠训练;(2)~最优模型规模遵循$N^* \propto t^{0.60}$,其增长速度\emph{快于}Chinchilla定律的$N^* \propto C^{0.50}$,且$α= 0.60 \pm 0.07$在所有敏感性分析中均稳健地超越计算最优;(3)~存在\emph{双U形机制}:短预算U形曲线源于计算瓶颈,长预算U形曲线源于数据瓶颈(过拟合),中间存在U形曲线暂时消失的过渡区间。这些发现对在消费级硬件上进行训练的研究者具有直接启示——约束条件为墙上时钟时间而非FLOPs。我们已开源所有代码、日志及70余项实验配置。