As the cost of training ever larger language models has grown, so has the interest in reusing previously learnt knowledge. Transfer learning methods have shown how reusing non-task-specific knowledge can help in subsequent task-specific learning. In this paper, we investigate the inverse: porting whole functional modules that encode task-specific knowledge from one model to another. We designed a study comprising 1,440 training/testing runs to test the portability of modules trained by parameter-efficient finetuning (PEFT) techniques, using sentiment analysis as an example task. We test portability in a wide range of scenarios, involving different PEFT techniques and different pretrained host models, among other dimensions. We compare the performance of ported modules with that of equivalent modules trained (i) from scratch, and (ii) from parameters sampled from the same distribution as the ported module. We find that the ported modules far outperform the two alternatives tested, but that there are interesting performance differences between the four PEFT techniques. We conclude that task-specific knowledge in the form of structurally modular sets of parameters as produced by PEFT techniques is highly portable, but that degree of success depends on type of PEFT and on differences between originating and receiving pretrained models.
翻译:随着训练越来越大的语言模型的成本不断增长,人们对重用先前学到的知识兴趣也与日俱增。迁移学习方法已展示了重用非任务特定知识如何有助于后续的任务特定学习。在本文中,我们研究其逆问题:将编码任务特定知识的整个功能模块从一个模型移植到另一个模型。我们设计了一项包含1,440次训练/测试运行的研究,以测试参数高效微调(PEFT)技术训练的模块的可移植性,并以情感分析作为示例任务。我们在多种场景下测试可移植性,涉及不同的PEFT技术和不同的预训练宿主模型等多个维度。我们将移植模块的性能与(i)从头训练得到的等效模块,以及(ii)从与移植模块相同分布中采样参数训练得到的等效模块进行比较。我们发现移植模块的表现远优于所测试的两种替代方案,但四种PEFT技术之间存在有趣的性能差异。我们得出结论:以PEFT技术产生的结构模块化参数集形式存在的任务特定知识具有很高的可移植性,但成功程度取决于PEFT的类型以及源预训练模型与目标预训练模型之间的差异。