We propose a novel parameter-efficient training (PET) method for large language models that adapts models to downstream tasks by optimizing a small subset of the existing model parameters. Unlike prior methods, this subset is not fixed in location but rather which parameters are modified evolves over the course of training. This dynamic parameter selection can yield good performance with many fewer parameters than extant methods. Our method enables a seamless scaling of the subset size across an arbitrary proportion of the total model size, while popular PET approaches like prompt tuning and LoRA cover only a small part of this spectrum. We match or outperform prompt tuning and LoRA in most cases on a variety of NLP tasks (MT, QA, GSM8K, SuperGLUE) for a given parameter budget across different model families and sizes.
翻译:我们提出了一种新颖的大语言模型参数高效训练方法,该方法通过优化现有模型参数的一个小子集来使模型适应下游任务。与先前方法不同,该子集的位置并非固定,而是所修改的参数在训练过程中动态演化。这种动态参数选择能够以远少于现有方法的参数量实现良好性能。我们的方法能够实现子集大小在总模型规模任意比例范围内的无缝扩展,而诸如提示调优和LoRA等主流参数高效训练方法仅覆盖该范围的一小部分。在多种NLP任务(机器翻译、问答、GSM8K、SuperGLUE)上,针对不同模型系列和规模,在给定参数量预算下,我们的方法在大多数情况下达到或超越了提示调优和LoRA的性能。