TDAD: Test-Driven Agentic Development - Reducing Code Regressions in AI Coding Agents via Graph-Based Impact Analysis

AI coding agents can resolve real-world software issues, yet they frequently introduce regressions -- breaking tests that previously passed. Current benchmarks focus almost exclusively on resolution rate, leaving regression behavior under-studied. This paper presents TDAD (Test-Driven Agentic Development), an open-source tool that performs pre-change impact analysis for AI coding agents. TDAD builds a dependency map between source code and tests so that before committing a patch, the agent knows which tests to verify and can self-correct. The map is delivered as a lightweight agent skill -- a static text file the agent queries at runtime. Evaluated on SWE-bench Verified with two open-weight models running on consumer hardware (Qwen3-Coder 30B, 100 instances; Qwen3.5-35B-A3B, 25 instances), TDAD reduced regressions by 70% (6.08% to 1.82%) compared to a vanilla baseline. In contrast, adding TDD procedural instructions without targeted test context increased regressions to 9.94% -- worse than no intervention at all. When deployed as an agent skill with a different model and framework, TDAD improved issue-resolution rate from 24% to 32%, confirming that surfacing contextual information outperforms prescribing procedural workflows. All code, data, and logs are publicly available at https://github.com/pepealonso95/TDAD.

翻译：摘要：AI编码智能体能够解决真实世界的软件问题，但常常引入回归——即破坏先前通过的测试。现有基准测试几乎只关注问题解决率，对回归行为的研究不足。本文提出TDAD（测试驱动的智能体开发），一个开源工具，用于AI编码智能体的变更前影响分析。TDAD构建源代码与测试之间的依赖映射，使得智能体在提交补丁前知道应验证哪些测试，并能自我纠正。该映射作为轻量级智能体技能——一个静态文本文件，智能体在运行时查询。在SWE-bench Verified上使用两个开放权重模型在消费级硬件上运行评估（Qwen3-Coder 30B，100个实例；Qwen3.5-35B-A3B，25个实例），与普通基线相比，TDAD将回归率降低了70%（从6.08%降至1.82%）。相比之下，在不提供针对性测试上下文的情况下加入TDD流程指令，回归率反而上升至9.94%——比不干预更差。当作为智能体技能与不同模型和框架一起部署时，TDAD将问题解决率从24%提升至32%，证实了呈现上下文信息优于规定流程化工作流。所有代码、数据和日志均公开于https://github.com/pepealonso95/TDAD。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

AutoScientists：自组织智能体团队驱动长期科学实验

专知会员服务

10+阅读 · 5月29日

【综述】智能体AI如何重塑软件开发生命周期：从代码补全到人类监督下的委托执行

专知会员服务

14+阅读 · 5月2日

构建面向终端的 AI 编程智能体：脚手架、测试环境、上下文工程及实践经验

专知会员服务

25+阅读 · 3月8日

通用智能体评估的逻辑架构

专知会员服务

22+阅读 · 2月28日