Selecting Large Language Model to Fine-tune via Rectified Scaling Law

The ever-growing ecosystem of LLMs has posed a challenge in selecting the most appropriate pre-trained model to fine-tune amidst a sea of options. Given constrained resources, fine-tuning all models and making selections afterward is unrealistic. In this work, we formulate this resource-constrained selection task into predicting fine-tuning performance and illustrate its natural connection with Scaling Law. Unlike pre-training, We find that the fine-tuning scaling curve includes not just the well-known "power phase" but also the previously unobserved "pre-power phase". We also explain why existing Scaling Law fails to capture this phase transition phenomenon both theoretically and empirically. To address this, we introduce the concept of "pre-learned data size" into our Rectified Scaling Law, which overcomes theoretical limitations and fits experimental results much better. By leveraging our law, we propose a novel LLM selection algorithm that selects the near-optimal model with hundreds of times less resource consumption, while other methods may provide negatively correlated selection.

翻译：随着大语言模型生态系统的不断扩展，如何在众多选项中选择最合适的预训练模型进行微调已成为一项挑战。在资源受限的情况下，对所有模型进行微调后再进行选择是不现实的。本研究将这一资源受限的选择任务形式化为对微调性能的预测，并阐述了其与缩放定律之间的内在联系。与预训练不同，我们发现微调缩放曲线不仅包含众所周知的“幂律阶段”，还包括先前未被观察到的“前幂律阶段”。我们从理论和实证两方面解释了为何现有的缩放定律无法捕捉这一相变现象。为解决此问题，我们在修正缩放定律中引入了“预学习数据量”的概念，该概念克服了理论局限性，并能更好地拟合实验结果。通过运用该定律，我们提出了一种新颖的大语言模型选择算法，该算法能以数百倍更少的资源消耗选择出接近最优的模型，而其他方法可能提供负相关的选择结果。

相关内容

Scaling Law

关注 0

从目前的研究总结发现，模型规模的扩展是LLM能力提升的一个关键因素。从GPT-3的175B参数量到PaLM的540B记录，都验证了模型规模的扩展，导致能力的提升。当然，大的模型尺寸是必不可少的，但是扩展定律并不仅限于此，它一共包括三个方面：模型尺寸（Model size）数据规模（Data size）总计算量（Total compute）此外，预训练数据的质量在保证模型性能方面有着关键作用，因此在扩展语料库时，要注意数据收集和清理的策略。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日