This paper presents CoverUp, a novel system that drives the generation of high-coverage Python regression tests via a combination of coverage analysis and large-language models (LLMs). CoverUp iteratively improves coverage, interleaving coverage analysis with dialogs with the LLM to focus its attention on as yet uncovered lines and branches. The resulting test suites significantly improve coverage over the current state of the art: compared to CodaMosa, a hybrid LLM / search-based software testing system, CoverUp substantially improves coverage across the board. On a per-module basis, CoverUp achieves median line coverage of 81% (vs. 62%), branch coverage of 53% (vs. 35%) and line+branch coverage of 78% (vs. 55%). We show that CoverUp's iterative, coverage-guided approach is crucial to its effectiveness, contributing to nearly half of its successes.
翻译:本文提出CoverUp,一种通过结合覆盖分析与大语言模型(LLM)生成高覆盖率Python回归测试的新型系统。CoverUp通过迭代改进覆盖率,将覆盖分析与LLM对话交替进行,引导模型聚焦于尚未覆盖的代码行与分支。由此生成的测试套件在覆盖率上显著超越当前最优方法:与混合LLM/基于搜索的软件测试系统CodaMosa相比,CoverUp在各项指标上均取得大幅提升。在模块级别,CoverUp的中位数行覆盖率达81%(对比62%),分支覆盖率达53%(对比35%),行+分支覆盖率达78%(对比55%)。实验表明,CoverUp基于覆盖引导的迭代策略对其有效性至关重要,贡献了将近一半的成功改进。