An exactly solvable model for emergence and scaling laws

Deep learning models can exhibit what appears to be a sudden ability to solve a new problem as training time, training data, or model size increases, a phenomenon known as emergence. In this paper, we present a framework where each new ability (a skill) is represented as a basis function. We solve a simple multi-linear model in this skill-basis, finding analytic expressions for the emergence of new skills, as well as for scaling laws of the loss with training time, data size, model size, and optimal compute ($C$). We compare our detailed calculations to direct simulations of a two-layer neural network trained on multitask sparse parity, where the tasks in the dataset are distributed according to a power-law. Our simple model captures, using a single fit parameter, the sigmoidal emergence of multiple new skills as training time, data size or model size increases in the neural network.

翻译：深度学习模型在训练时间、训练数据或模型规模增加时，可能表现出看似突然获得解决新问题的能力，这种现象被称为涌现。本文提出一个框架，将每种新能力（即技能）表示为基函数。我们在该技能基中求解一个简单的多线性模型，得到了新技能涌现的解析表达式，以及损失随训练时间、数据规模、模型规模和最优计算量（$C$）变化的标度律。我们将详细计算结果与在多重任务稀疏奇偶性问题上训练的两层神经网络的直接模拟进行对比，其中数据集中任务的分布遵循幂律。我们的简单模型仅使用单个拟合参数，即可刻画神经网络中随训练时间、数据规模或模型规模增加时多种新技能的S型涌现现象。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

DiffRec: 扩散推荐模型（SIGIR'23）

专知会员服务

48+阅读 · 2023年4月16日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

自动结构变分推理，Automatic structured variational inference

专知会员服务

41+阅读 · 2020年2月10日