AI coding agents can resolve real-world software issues, yet they frequently introduce regressions -- breaking tests that previously passed. Current benchmarks focus almost exclusively on resolution rate, leaving regression behavior under-studied. This paper presents TDAD (Test-Driven Agentic Development), an open-source tool that performs pre-change impact analysis for AI coding agents. TDAD builds a dependency map between source code and tests so that before committing a patch, the agent knows which tests to verify and can self-correct. The map is delivered as a lightweight agent skill -- a static text file the agent queries at runtime. Evaluated on SWE-bench Verified with two open-weight models running on consumer hardware (Qwen3-Coder 30B, 100 instances; Qwen3.5-35B-A3B, 25 instances), TDAD reduced regressions by 70% (6.08% to 1.82%) compared to a vanilla baseline. In contrast, adding TDD procedural instructions without targeted test context increased regressions to 9.94% -- worse than no intervention at all. When deployed as an agent skill with a different model and framework, TDAD improved issue-resolution rate from 24% to 32%, confirming that surfacing contextual information outperforms prescribing procedural workflows. All code, data, and logs are publicly available at https://github.com/pepealonso95/TDAD.
翻译:摘要:AI编码智能体能够解决真实世界的软件问题,但常常引入回归——即破坏先前通过的测试。现有基准测试几乎只关注问题解决率,对回归行为的研究不足。本文提出TDAD(测试驱动的智能体开发),一个开源工具,用于AI编码智能体的变更前影响分析。TDAD构建源代码与测试之间的依赖映射,使得智能体在提交补丁前知道应验证哪些测试,并能自我纠正。该映射作为轻量级智能体技能——一个静态文本文件,智能体在运行时查询。在SWE-bench Verified上使用两个开放权重模型在消费级硬件上运行评估(Qwen3-Coder 30B,100个实例;Qwen3.5-35B-A3B,25个实例),与普通基线相比,TDAD将回归率降低了70%(从6.08%降至1.82%)。相比之下,在不提供针对性测试上下文的情况下加入TDD流程指令,回归率反而上升至9.94%——比不干预更差。当作为智能体技能与不同模型和框架一起部署时,TDAD将问题解决率从24%提升至32%,证实了呈现上下文信息优于规定流程化工作流。所有代码、数据和日志均公开于https://github.com/pepealonso95/TDAD。