Though many machine learning (ML)-based unit testing generation approaches have been proposed and indeed achieved remarkable performance, they still have several limitations in effectiveness and practical usage. More precisely, existing ML-based approaches (1) generate partial content of a unit test, mainly focusing on test oracle generation; (2) mismatch the test prefix with the test oracle semantically; and (3) are highly bound with the close-sourced model, eventually damaging data security. We propose CasModaTest, a cascaded, model-agnostic, and end-to-end unit test generation framework, to alleviate the above limitations with two cascaded stages: test prefix generation and test oracle generation. Then, we manually build large-scale demo pools to provide CasModaTest with high-quality test prefixes and test oracles examples. Finally, CasModaTest automatically assembles the generated test prefixes and test oracles and compiles or executes them to check their effectiveness, optionally appending with several attempts to fix the errors occurring in compiling and executing phases. To evaluate the effectiveness of CasModaTest, we conduct large-scale experiments on a widely used dataset (Defects4J) and compare it with four state-of-the-art (SOTA) approaches by considering two performance measures. The experimental results indicate that CasModaTest outperforms all SOTAs with a substantial improvement (i.e., 60.62%-352.55% in terms of accuracy, 2.83%-87.27% in terms of focal method coverage). Besides, we also conduct experiments of CasModaTest on different open-source LLMs and find that CasModaTest can also achieve significant improvements over SOTAs (39.82%-293.96% and 9.25%-98.95% in terms of accuracy and focal method coverage, respectively) in end-to-end unit test generation
翻译:尽管已有许多基于机器学习(ML)的单元测试生成方法被提出,并确实取得了显著性能,但它们在有效性和实际使用方面仍存在若干局限。更具体而言,现有的基于ML的方法(1)仅生成单元测试的部分内容,主要集中于测试预言生成;(2)在语义上测试前缀与测试预言不匹配;(3)高度依赖闭源模型,最终损害数据安全性。为缓解上述局限,我们提出了CasModaTest,一种级联式、模型无关且端到端的单元测试生成框架,其包含两个级联阶段:测试前缀生成与测试预言生成。随后,我们手动构建大规模演示池,为CasModaTest提供高质量的测试前缀与测试预言示例。最后,CasModaTest自动组装生成的测试前缀与测试预言,并通过编译或执行来检验其有效性,并可选择性地附加多次尝试以修复编译与执行阶段出现的错误。为评估CasModaTest的有效性,我们在一个广泛使用的数据集(Defects4J)上进行了大规模实验,并基于两项性能指标将其与四种最先进(SOTA)方法进行比较。实验结果表明,CasModaTest在所有SOTA方法上均取得显著提升(即准确率提升60.62%-352.55%,焦点方法覆盖率提升2.83%-87.27%)。此外,我们还在不同的开源大语言模型(LLM)上对CasModaTest进行了实验,发现CasModaTest在端到端单元测试生成中同样能实现相对于SOTA方法的显著改进(准确率提升39.82%-293.96%,焦点方法覆盖率提升9.25%-98.95%)。