Pre-trained models (PTMs) have achieved great success in various Software Engineering (SE) downstream tasks following the ``pre-train then fine-tune'' paradigm. As fully fine-tuning all parameters of PTMs can be computationally expensive, a widely used solution is parameter-efficient fine-tuning (PEFT), which freezes PTMs while introducing extra parameters. Though work has been done to test PEFT methods in the SE field, a comprehensive evaluation is still lacking. This paper aims to fill in this gap by evaluating the effectiveness of five PEFT methods on eight PTMs and four SE downstream tasks. For different tasks and PEFT methods, we seek answers to the following research questions: 1) Is it more effective to use PTMs trained specifically on source code, or is it sufficient to use PTMs trained on natural language text? 2) What is the impact of varying model sizes? 3) How does the model architecture affect the performance? Besides effectiveness, we also discuss the efficiency of PEFT methods, concerning the costs of required training time and GPU resource consumption. We hope that our findings can provide a deeper understanding of PEFT methods on various PTMs and SE downstream tasks. All the codes and data are available at \url{https://github.com/zwtnju/PEFT.git}.
翻译:预训练模型(PTMs)在遵循“预训练-微调”范式的各类软件工程(SE)下游任务中取得了巨大成功。由于完全微调PTMs的所有参数计算成本高昂,一种广泛使用的解决方案是参数高效微调(PEFT),该方法冻结PTMs的同时引入额外参数。尽管已有研究尝试在SE领域测试PEFT方法,但仍缺乏系统性的综合评估。本文旨在填补这一空白,评估五种PEFT方法在八种PTMs和四个SE下游任务中的有效性。针对不同任务与PEFT方法,我们寻求以下研究问题的答案:1)使用专门针对源代码训练的PTMs是否更有效,还是使用自然语言文本训练的PTMs就足够?2)不同模型规模带来的影响是什么?3)模型架构如何影响性能?除有效性外,我们还从所需训练时间和GPU资源消耗成本角度探讨了PEFT方法的效率。我们期望研究成果能深化对PEFT方法在各类PTMs和SE下游任务中的应用理解。所有代码与数据均可在\url{https://github.com/zwtnju/PEFT.git}获取。