Recently, deep learning-based test case generation approaches have been proposed to automate the generation of unit test cases. In this study, we leverage Transformer-based code models to generate unit tests with the help of Domain Adaptation (DA) at a project level. Specifically, we use CodeT5, which is a relatively small language model trained on source code data, and fine-tune it on the test generation task; then again further fine-tune it on each target project data to learn the project-specific knowledge (project-level DA). We use the Methods2test dataset to fine-tune CodeT5 for the test generation task and the Defects4j dataset for project-level domain adaptation and evaluation. We compare our approach with (a) CodeT5 fine-tuned on the test generation without DA, (b) the A3Test tool, and (c) GPT-4, on 5 projects from the Defects4j dataset. The results show that using DA can increase the line coverage of the generated tests on average 18.62%, 19.88%, and 18.02% compared to the above (a), (b), and (c) baselines, respectively. The results also consistently show improvements using other metrics such as BLEU and CodeBLEU. In addition, we show that our approach can be seen as a complementary solution alongside existing search-based test generation tools such as EvoSuite, to increase the overall coverage and mutation scores with an average of 34.42% and 6.8%, for line coverage and mutation score, respectively.
翻译:近年来,基于深度学习的测试用例生成方法已被提出,用于自动化单元测试用例的生成。本研究利用基于Transformer的代码模型,在项目层面借助领域适配生成单元测试。具体而言,我们采用CodeT5——一个在源代码数据上训练的规模相对较小的语言模型——首先在测试生成任务上对其微调,随后进一步在各目标项目数据上进行二次微调,以学习项目特定知识(项目级领域适配)。我们使用Methods2test数据集对CodeT5进行测试生成任务的微调,并使用Defects4j数据集进行项目级领域适配与评估。我们在Defects4j数据集的5个项目上,将我们的方法与以下基线进行对比:(a)未使用领域适配微调的CodeT5,(b)A3Test工具,以及(c)GPT-4。结果表明,与上述(a)、(b)、(c)基线相比,使用领域适配可使生成测试的行覆盖率平均分别提高18.62%、19.88%和18.02%。同时,在BLEU和CodeBLEU等其他指标上也观察到持续改进。此外,研究表明,我们的方法可作为现有基于搜索的测试生成工具(如EvoSuite)的补充方案,使行覆盖率和变异评分分别平均提升34.42%和6.8%。