Neural scaling laws characterize how model performance improves as the model size scales up. Inspired by empirical observations, we introduce a resource model of neural scaling. A task is usually composite hence can be decomposed into many subtasks, which compete for resources (measured by the number of neurons allocated to subtasks). On toy problems, we empirically find that: (1) The loss of a subtask is inversely proportional to its allocated neurons. (2) When multiple subtasks are present in a composite task, the resources acquired by each subtask uniformly grow as models get larger, keeping the ratios of acquired resources constants. We hypothesize these findings to be generally true and build a model to predict neural scaling laws for general composite tasks, which successfully replicates the neural scaling law of Chinchilla models reported in arXiv:2203.15556. We believe that the notion of resource used in this paper will be a useful tool for characterizing and diagnosing neural networks.
翻译:神经缩放定律描述了模型性能随规模扩大而提升的规律。受经验观测启发,我们提出一种神经缩放的资源模型。任务通常具有复合性,可分解为多个子任务,这些子任务竞争资源(以分配给各子任务的神经元数量衡量)。在玩具问题上,我们通过实验发现:(1) 子任务的损失与其分配的神经元数量成反比;(2) 当复合任务包含多个子任务时,各子任务获取的资源随模型增大而均匀增长,且资源分配比例保持恒定。我们推断这些发现具有普适性,并构建了一个预测复合任务神经缩放定律的模型,该模型成功复现了arXiv:2203.15556中Chinchilla模型的神经缩放定律。我们相信本文提出的资源概念将成为刻画与诊断神经网络的有效工具。