Software testing is a critical, yet resource-intensive phase of the software development lifecycle. Over the years, various automated tools have been developed to aid in this process. Search-based approaches typically achieve high coverage but produce tests with low readability, whereas large language model (LLM)-based methods generate more human-readable tests but often suffer from low coverage and compilability. While the majority of research efforts have focused on improving test coverage and readability, little attention has been paid to enhancing the robustness of bug detection, particularly in exposing corner cases and vulnerable execution paths. To address this gap, we propose AdverTest, a novel adversarial framework for LLM-powered test case generation. AdverTest comprises two interacting agents: a test case generation agent (T) and a mutant generation agent (M). These agents engage in an adversarial loop, where M persistently creates new mutants "hacking" the blind spots of T's current test suite, while T iteratively refines its test cases to "kill" the challenging mutants produced by M. This interaction loop is guided by both coverage and mutation scores, enabling the system to co-evolve toward both high test coverage and bug detection capability. Experimental results in the Defects4J dataset show that our approach improves fault detection rates by 8.56% over the best existing LLM-based methods and by 63.30% over EvoSuite, while also improving line and branch coverage.
翻译:软件测试是软件开发生命周期中至关重要但资源密集的阶段。多年来,已开发出多种自动化工具以辅助此过程。基于搜索的方法通常能实现高覆盖率,但生成的测试可读性较低;而基于大语言模型的方法能生成更具可读性的测试,却常面临覆盖率和可编译性不足的问题。尽管多数研究工作集中于提升测试覆盖率和可读性,但对增强缺陷检测鲁棒性的关注不足,特别是在暴露边界情况和脆弱执行路径方面。为填补这一空白,我们提出AdverTest——一种用于LLM驱动测试用例生成的新型对抗框架。AdverTest包含两个交互代理:测试用例生成代理和变异体生成代理。这些代理通过对抗循环进行交互:持续生成新的变异体以“攻击”当前测试套件的盲区,同时迭代优化其测试用例以“消灭”生成的挑战性变异体。该交互循环由覆盖率和变异分数共同引导,使系统能协同进化至高测试覆盖率与缺陷检测能力。在Defects4J数据集上的实验结果表明:相较于现有最佳LLM方法,我们的方法将缺陷检测率提升了8.56%;相较于EvoSuite提升了63.30%,同时在线覆盖率和分支覆盖率方面也获得改进。