Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study

The advent of large language models (LLMs) like GitHub Copilot has significantly enhanced programmers' productivity, particularly in code generation. However, these models often struggle with real-world tasks without fine-tuning. As LLMs grow larger and more performant, fine-tuning for specialized tasks becomes increasingly expensive. Parameter-efficient fine-tuning (PEFT) methods, which fine-tune only a subset of model parameters, offer a promising solution by reducing the computational costs of tuning LLMs while maintaining their performance. Existing studies have explored using PEFT and LLMs for various code-related tasks and found that the effectiveness of PEFT techniques is task-dependent. The application of PEFT techniques in unit test generation remains underexplored. The state-of-the-art is limited to using LLMs with full fine-tuning to generate unit tests. This paper investigates both full fine-tuning and various PEFT methods, including LoRA, (IA)^3, and prompt tuning, across different model architectures and sizes. We use well-established benchmark datasets to evaluate their effectiveness in unit test generation. Our findings show that PEFT methods can deliver performance comparable to full fine-tuning for unit test generation, making specialized fine-tuning more accessible and cost-effective. Notably, prompt tuning is the most effective in terms of cost and resource utilization, while LoRA approaches the effectiveness of full fine-tuning in several cases.

翻译：GitHub Copilot等大型语言模型（LLMs）的出现显著提升了程序员的生产力，特别是在代码生成方面。然而，这些模型若未经微调，往往难以应对实际任务。随着LLMs规模不断扩大、性能持续提升，针对特定任务的微调成本日益增加。参数高效微调（PEFT）方法通过仅微调模型参数子集，在保持性能的同时降低LLMs的调优计算成本，为此提供了有前景的解决方案。现有研究已探索将PEFT与LLMs应用于各类代码相关任务，并发现PEFT技术的有效性具有任务依赖性。PEFT技术在单元测试生成中的应用仍待深入探索，当前最先进方法仅限于使用完全微调的LLMs生成单元测试。本文研究了包括LoRA、(IA)^3和提示调优在内的多种PEFT方法及完全微调策略，覆盖不同模型架构与规模。我们采用成熟的基准数据集评估其在单元测试生成中的有效性。研究结果表明，对于单元测试生成任务，PEFT方法能够达到与完全微调相当的性能，使专业化微调更易实施且成本效益更高。值得注意的是，提示调优在成本与资源利用率方面最为高效，而LoRA在多种情况下可接近完全微调的效果。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日