We present a functional form (that we refer to as a Unified Neural Scaling Law (UNSL)) that accurately models and extrapolates the scaling behaviors of deep neural networks as multiple dimensions all vary simultaneously (i.e. how the evaluation metric of interest varies as one simultaneously varies the number of model parameters, training dataset size, number of training steps, number of inference steps, amount of compute, and various hyperparameters) for various architectures and for each of various tasks within a varied set of upstream and downstream tasks. This set includes large-scale vision, language, math, and reinforcement learning. When compared to other functional forms for neural scaling, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set.
翻译:我们提出了一种函数形式(称为统一神经缩放定律,UNSL),该形式能够精确建模和推断深度神经网络在多个维度同时变化时的缩放行为(即评估指标如何随着模型参数数量、训练数据集大小、训练步数、推理步数、计算量及各种超参数的同时变化而变化),适用于多种架构以及上游和下游任务集合中的各类任务。该任务集合涵盖了大规模视觉、语言、数学和强化学习。与其他神经缩放的函数形式相比,该函数形式在此集合上得出的缩放行为推断结果显著更为准确。