Large pretrained language models are widely used in downstream NLP tasks via task-specific fine-tuning, but such procedures can be costly. Recently, Parameter-Efficient Fine-Tuning (PEFT) methods have achieved strong task performance while updating much fewer parameters than full model fine-tuning (FFT). However, it is non-trivial to make informed design choices on the PEFT configurations, such as their architecture, the number of tunable parameters, and even the layers in which the PEFT modules are inserted. Consequently, it is highly likely that the current, manually designed configurations are suboptimal in terms of their performance-efficiency trade-off. Inspired by advances in neural architecture search, we propose AutoPEFT for automatic PEFT configuration selection: we first design an expressive configuration search space with multiple representative PEFT modules as building blocks. Using multi-objective Bayesian optimisation in a low-cost setup, we then discover a Pareto-optimal set of configurations with strong performance-cost trade-offs across different numbers of parameters that are also highly transferable across different tasks. Empirically, on GLUE and SuperGLUE tasks, we show that AutoPEFT-discovered configurations significantly outperform existing PEFT methods and are on par or better than FFT without incurring substantial training efficiency costs.
翻译:大规模预训练语言模型通过任务特定的微调广泛应用于下游自然语言处理任务,但此类过程成本高昂。近期,参数高效微调(PEFT)方法在仅更新远少于全模型微调(FFT)参数的情况下,实现了强大的任务性能。然而,针对PEFT配置(如架构、可调参数数量甚至插入PEFT模块的层数)做出明智的设计选择并非易事。因此,当前手工设计的配置在其性能-效率权衡方面很可能未能达到最优。受神经架构搜索领域进展的启发,我们提出AutoPEFT用于自动PEFT配置选择:首先设计一个包含多种代表性PEFT模块作为构建块的表现力丰富的配置搜索空间。随后,在低成本设置下利用多目标贝叶斯优化,发现一组具有强大性能-成本权衡的帕累托最优配置,这些配置在不同参数数量下表现优异,且在不同任务间具有高度可迁移性。在GLUE和SuperGLUE任务上的实验表明,AutoPEFT发现的配置显著优于现有PEFT方法,且在不显著增加训练效率成本的情况下达到或超越FFT性能。