Software testing has progressed toward intelligent automation, yet current AI-based test generators still suffer from static, single-shot outputs that frequently produce invalid, redundant, or non-executable tests due to the lack of execution aware feedback. This paper introduces an agentic multi-model testing framework a closed-loop, self-correcting system in which a Test Generation Agent, an Execution and Analysis Agent, and a Review and Optimization Agent collaboratively generate, execute, analyze, and refine tests until convergence. By using sandboxed execution, detailed failure reporting, and iterative regeneration or patching of failing tests, the framework autonomously improves test quality and expands coverage. Integrated into a CI/CD-compatible pipeline, it leverages reinforcement signals from coverage metrics and execution outcomes to guide refinement. Empirical evaluations on microservice based applications show up to a 60% reduction in invalid tests, 30% coverage improvement, and significantly reduced human effort compared to single-model baselines demonstrating that multi-agent, feedback-driven loops can evolve software testing into an autonomous, continuously learning quality assurance ecosystem for self-healing, high-reliability codebases.
翻译:软件测试已向智能自动化方向发展,然而当前基于人工智能的测试生成器仍受限于静态、单次输出模式,由于缺乏执行感知反馈,常产生无效、冗余或不可执行的测试。本文提出一种智能多模型测试框架——一个闭环自校正系统,其中测试生成智能体、执行分析智能体与评审优化智能体协同生成、执行、分析并优化测试直至收敛。通过采用沙箱执行、详细故障报告以及对失败测试的迭代再生与修补,该框架能自主提升测试质量并扩大覆盖范围。集成至CI/CD兼容流水线后,系统利用来自覆盖率指标和执行结果的强化信号指导优化过程。基于微服务应用的实证评估表明:相较于单模型基线方法,本框架可减少高达60%的无效测试,提升30%的覆盖率,并显著降低人工干预成本,证明多智能体驱动的反馈循环能将软件测试演进为自主持续学习的质量保障生态系统,最终实现自修复、高可靠性的代码库。