Establishing Task Scaling Laws via Compute-Efficient Model Ladders

Akshita Bhagia,Jiacheng Liu,Alexander Wettig,David Heineman,Oyvind Tafjord,Ananya Harsh Jha,Luca Soldaini,Noah A. Smith,Dirk Groeneveld,Pang Wei Koh,Jesse Dodge,Hannaneh Hajishirzi

We develop task scaling laws and model ladders to predict the individual task performance of pretrained language models (LMs) in the overtrained setting. Standard power laws for language modeling loss cannot accurately model task performance. Therefore, we leverage a two-step prediction approach: first use model and data size to predict a task-specific loss, and then use this task loss to predict task performance. We train a set of small-scale "ladder" models, collect data points to fit the parameterized functions of the two prediction steps, and make predictions for two target models: a 7B model trained to 4T tokens and a 13B model trained to 5T tokens. Training the ladder models only costs 1% of the compute used for the target models. On four multiple-choice tasks written in ranked classification format, we can predict the accuracy of both target models within 2 points of absolute error. We have higher prediction error on four other tasks (average absolute error 6.9) and find that these are often tasks with higher variance in task metrics. We also find that using less compute to train fewer ladder models tends to deteriorate predictions. Finally, we empirically show that our design choices and the two-step approach lead to superior performance in establishing scaling laws.

翻译：本文开发了任务缩放定律与模型阶梯，用于预测预训练语言模型（LMs）在过训练设定下的个体任务性能。标准的语言建模损失幂律无法准确建模任务性能。因此，我们采用两步预测方法：首先利用模型与数据规模预测任务特定损失，然后利用该任务损失预测任务性能。我们训练了一组小规模“阶梯”模型，收集数据点以拟合两个预测步骤的参数化函数，并对两个目标模型进行预测：一个训练至4T词元的7B模型与一个训练至5T词元的13B模型。训练阶梯模型仅消耗目标模型所需计算量的1%。在四个以排序分类格式编写的多项选择任务上，我们能够将两个目标模型的准确率预测误差控制在2个绝对误差点以内。在另外四个任务上我们观察到更高的预测误差（平均绝对误差6.9），并发现这些通常是任务指标方差较高的任务。我们还发现，使用更少计算资源训练更少的阶梯模型往往会导致预测性能下降。最后，我们通过实证表明，我们的设计选择与两步预测方法在建立缩放定律方面具有优越性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日