增强大型语言模型在文本到测试用例生成中的应用 (Enhancing Large Language Models for Text-to-Testcase Generation)

Context: Test-driven development (TDD) is a widely employed software development practice that involves developing test cases based on requirements prior to writing the code. Although various methods for automated test case generation have been proposed, they are not specifically tailored for TDD, where requirements instead of code serve as input. Objective: In this paper, we introduce a text-to-testcase generation approach based on a large language model (GPT-3.5) that is fine-tuned on our curated dataset with an effective prompt design. Method: Our approach involves enhancing the capabilities of basic GPT-3.5 for text-to-testcase generation task that is fine-tuned on our curated dataset with an effective prompting design. We evaluated the effectiveness of our approach using a span of five large-scale open-source software projects. Results: Our approach generated 7k test cases for open source projects, achieving 78.5% syntactic correctness, 67.09% requirement alignment, and 61.7% code coverage, which substantially outperforms all other LLMs (basic GPT-3.5, Bloom, and CodeT5). In addition, our ablation study demonstrates the substantial performance improvement of the fine-tuning and prompting components of the GPT-3.5 model. Conclusions: These findings lead us to conclude that fine-tuning and prompting should be considered in the future when building a language model for the text-to-testcase generation task

翻译：背景：测试驱动开发（TDD）是一种广泛采用的软件开发实践，它要求在编写代码之前根据需求开发测试用例。尽管已提出多种自动化测试用例生成方法，但它们并非专门为TDD定制，其中输入是需求而非代码。目标：本文提出一种基于大型语言模型（GPT-3.5）的文本到测试用例生成方法，该方法通过精心设计提示并在我们构建的数据集上进行微调。方法：我们的方法通过有效提示设计并在定制数据集上微调，增强了基础GPT-3.5模型在文本到测试用例生成任务中的能力。我们使用五个大型开源软件项目评估了该方法的有效性。结果：我们的方法为开源项目生成了7千个测试用例，实现了78.5%的语法正确率、67.09%的需求对齐度和61.7%的代码覆盖率，显著优于所有其他大型语言模型（基础GPT-3.5、Bloom和CodeT5）。此外，消融研究表明GPT-3.5模型的微调和提示组件带来了显著的性能提升。结论：这些发现使我们得出结论，未来构建用于文本到测试用例生成任务的语言模型时，应考虑纳入微调和提示技术。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/