TDAD: Test-Driven Agentic Development - Reducing Code Regressions in AI Coding Agents via Graph-Based Impact Analysis

AI coding agents can resolve real-world software issues, yet they frequently introduce regressions, breaking tests that previously passed. Current benchmarks focus almost exclusively on resolution rate, leaving regression behavior under-studied. This paper presents TDAD (Test-Driven Agentic Development), an open-source tool and benchmark methodology that combines abstract-syntax-tree (AST) based code-test graph construction with weighted impact analysis to surface the tests most likely affected by a proposed change. Evaluated on SWE-bench Verified with two local models (Qwen3-Coder 30B on 100 instances and Qwen3.5-35B-A3B on 25 instances), TDAD's GraphRAG workflow reduced test-level regressions by 70% (6.08% to 1.82%) and improved resolution from 24% to 32% when deployed as an agent skill. A surprising finding is that TDD prompting alone increased regressions (9.94%), revealing that smaller models benefit more from contextual information (which tests to verify) than from procedural instructions (how to do TDD). An autonomous auto-improvement loop raised resolution from 12% to 60% on a 10-instance subset with 0% regression. These findings suggest that for AI agent tool design, surfacing contextual information outperforms prescribing procedural workflows. All code, data, and logs are publicly available at https://github.com/pepealonso95/TDAD.

翻译：[translated abstract in Chinese] AI编码智能体能够解决现实世界中的软件问题，但常常引入回归问题，导致先前通过的测试失效。当前基准测试几乎完全聚焦于解决率，而对回归行为的研究不足。本文提出TDAD（测试驱动的智能体开发），一个开源工具和基准测试方法论，结合基于抽象语法树（AST）的代码-测试图构建与加权影响分析，以揭示最可能受提议变更影响的测试。在SWE-bench Verified上使用两个本地模型（Qwen3-Coder 30B在100个实例上，Qwen3.5-35B-A3B在25个实例上）进行评估，TDAD的GraphRAG工作流将测试级回归减少了70%（从6.08%降至1.82%），并在作为智能体技能部署时将解决率从24%提升至32%。一个令人惊讶的发现是，仅使用TDD提示反而增加了回归（9.94%），这表明较小的模型更受益于上下文信息（要验证哪些测试）而非过程性指令（如何进行TDD）。一个自主的自我改进循环在10个实例子集上将解决率从12%提升至60%，且回归率为0%。这些发现表明，在AI智能体工具设计中，呈现上下文信息优于规定过程性工作流。所有代码、数据和日志均公开于https://github.com/pepealonso95/TDAD。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

AutoScientists：自组织智能体团队驱动长期科学实验

专知会员服务

10+阅读 · 5月29日

构建面向终端的 AI 编程智能体：脚手架、测试环境、上下文工程及实践经验

专知会员服务

25+阅读 · 3月8日

通用智能体评估的逻辑架构

专知会员服务

22+阅读 · 2月28日

智能体 AI (Agentic AI) 的新进展：回归初心，预见未来

专知会员服务

30+阅读 · 1月2日