Context: Test-driven development (TDD) is a widely employed software development practice that involves developing test cases based on requirements prior to writing the code. Although various methods for automated test case generation have been proposed, they are not specifically tailored for TDD, where requirements instead of code serve as input. Objective: In this paper, we introduce a text-to-testcase generation approach based on a large language model (GPT-3.5) that is fine-tuned on our curated dataset with an effective prompt design. Method: Our approach involves enhancing the capabilities of basic GPT-3.5 for text-to-testcase generation task that is fine-tuned on our curated dataset with an effective prompting design. We evaluated the effectiveness of our approach using a span of five large-scale open-source software projects. Results: Our approach generated 7k test cases for open source projects, achieving 78.5% syntactic correctness, 67.09% requirement alignment, and 61.7% code coverage, which substantially outperforms all other LLMs (basic GPT-3.5, Bloom, and CodeT5). In addition, our ablation study demonstrates the substantial performance improvement of the fine-tuning and prompting components of the GPT-3.5 model. Conclusions: These findings lead us to conclude that fine-tuning and prompting should be considered in the future when building a language model for the text-to-testcase generation task
翻译:背景:测试驱动开发(TDD)是一种广泛采用的软件开发实践,它要求在编写代码之前根据需求开发测试用例。尽管已提出多种自动化测试用例生成方法,但它们并非专门为TDD定制,其中输入是需求而非代码。目标:本文提出一种基于大型语言模型(GPT-3.5)的文本到测试用例生成方法,该方法通过精心设计提示并在我们构建的数据集上进行微调。方法:我们的方法通过有效提示设计并在定制数据集上微调,增强了基础GPT-3.5模型在文本到测试用例生成任务中的能力。我们使用五个大型开源软件项目评估了该方法的有效性。结果:我们的方法为开源项目生成了7千个测试用例,实现了78.5%的语法正确率、67.09%的需求对齐度和61.7%的代码覆盖率,显著优于所有其他大型语言模型(基础GPT-3.5、Bloom和CodeT5)。此外,消融研究表明GPT-3.5模型的微调和提示组件带来了显著的性能提升。结论:这些发现使我们得出结论,未来构建用于文本到测试用例生成任务的语言模型时,应考虑纳入微调和提示技术。