Large Language Models (LLMs) have significantly advanced automated test generation, yet existing methods often rely on ground-truth code for verification, risking bug propagation and limiting applicability in test-driven development. We present ConVerTest, a novel two-stage pipeline for synthesizing reliable tests without requiring prior code implementations. ConVerTest integrates three core strategies: (i) Self-Consistency(SC) to generate convergent test cases via majority voting; (ii) Chain-of-Verification (CoVe) for iterative, reasoning-guided code refinement; and (iii) a Dual Execution Agreement to crossvalidate code and tests through consensus. Experiments on BIGCODEBENCH and LESS BASIC PYTHON PROBLEMS (LBPP) benchmarks demonstrate that ConVerTest improves test validity, line coverage, and mutation scores by up to 39%, 28%, and 18% respectively over baselines. Our findings highlight ConVerTest as a robust solution for mitigating hallucinations and enhancing the reliability of autonomous software testing agents.
翻译:大语言模型显著推动了自动化测试生成的发展,然而现有方法通常依赖真实代码进行验证,存在传播缺陷的风险,并限制了其在测试驱动开发中的适用性。我们提出ConVerTest,一种新颖的两阶段流程,用于合成可靠测试而无需预先的代码实现。ConVerTest整合了三种核心策略:(i) 自一致性通过多数投票生成收敛的测试用例;(ii) 验证链用于迭代的、推理引导的代码精化;(iii) 双重执行一致性通过共识交叉验证代码与测试。在BIGCODEBENCH和LESS BASIC PYTHON PROBLEMS基准上的实验表明,ConVerTest相较于基线方法,将测试有效性、行覆盖率和变异分数分别提升了最高39%、28%和18%。我们的研究结果凸显了ConVerTest作为一种鲁棒解决方案,能够有效缓解幻觉并增强自主软件测试代理的可靠性。