Neural scaling laws characterize how model performance improves as the model size scales up. Inspired by empirical observations, we introduce a resource model of neural scaling. A task is usually composite hence can be decomposed into many subtasks, which compete for resources (measured by the number of neurons allocated to subtasks). On toy problems, we empirically find that: (1) The loss of a subtask is inversely proportional to its allocated neurons. (2) When multiple subtasks are present in a composite task, the resources acquired by each subtask uniformly grow as models get larger, keeping the ratios of acquired resources constants. We hypothesize these findings to be generally true and build a model to predict neural scaling laws for general composite tasks, which successfully replicates the neural scaling law of Chinchilla models reported in arXiv:2203.15556. We believe that the notion of resource used in this paper will be a useful tool for characterizing and diagnosing neural networks.
翻译:神经缩放定律刻画了模型性能随规模扩大而提升的规律。受经验观察启发,我们提出了一种神经缩放的资源模型。任务通常具有复合性,因此可分解为多个子任务,这些子任务竞争资源(以分配给子任务的神经元数量衡量)。在简化问题上的实验发现:(1)子任务的损失与其分配到的神经元数量成反比;(2)当复合任务包含多个子任务时,随着模型规模增大,各子任务获取的资源均匀增长,且资源占比保持恒定。我们假设这些发现具有普适性,并构建了一个用于预测复合任务神经缩放定律的模型,该模型成功复现了arXiv:2203.15556报告中Chinchilla模型的神经缩放律。我们认为本文提出的"资源"概念将有望成为表征和诊断神经网络的有效工具。