Software testing is a critical, yet resource-intensive phase of the software development lifecycle. Over the years, various automated tools have been developed to aid in this process. Search-based approaches typically achieve high coverage but produce tests with low readability, whereas large language model (LLM)-based methods generate more human-readable tests but often suffer from low coverage and compilability. While the majority of research efforts have focused on improving test coverage and readability, little attention has been paid to enhancing the robustness of bug detection, particularly in exposing corner cases and vulnerable execution paths. To address this gap, we propose AdverTest, a novel adversarial framework for LLM-powered test case generation. AdverTest comprises two interacting agents: a test case generation agent (T) and a mutant generation agent (M). These agents engage in an adversarial loop, where M persistently creates new mutants "hacking" the blind spots of T's current test suite, while T iteratively refines its test cases to "kill" the challenging mutants produced by M. This interaction loop is guided by both coverage and mutation scores, enabling the system to co-evolve toward both high test coverage and bug detection capability. Experimental results in the Defects4J dataset show that our approach improves fault detection rates by 8.56% over the best existing LLM-based methods and by 63.30% over EvoSuite, while also improving line and branch coverage.
翻译:软件测试是软件开发生命周期中至关重要但资源密集的阶段。多年来,已开发出多种自动化工具以辅助此过程。基于搜索的方法通常能实现高覆盖率,但生成的测试可读性较低;而基于大语言模型(LLM)的方法能生成更符合人类阅读习惯的测试,却往往面临覆盖率低和可编译性差的问题。尽管多数研究致力于提升测试覆盖率和可读性,但鲜有关注如何增强缺陷检测的鲁棒性,特别是在暴露边界情况和脆弱执行路径方面。为填补这一空白,我们提出AdverTest——一种基于LLM的对抗性测试用例生成框架。AdverTest包含两个交互代理:测试用例生成代理(T)与变异体生成代理(M)。这两个代理在对抗循环中交互:M持续生成新的变异体以“攻击”T当前测试套件的盲区,而T则迭代优化其测试用例以“消灭”M生成的挑战性变异体。该交互循环由覆盖率和变异分数共同引导,使系统能够协同进化,同时实现高测试覆盖率和缺陷检测能力。在Defects4J数据集上的实验结果表明:相较于现有最优的基于LLM的方法,我们的方法将缺陷检测率提升了8.56%;相较于EvoSuite则提升了63.30%,同时在行覆盖率和分支覆盖率方面也有所提高。