Translating natural language descriptions into viable code fixes remains a fundamental challenge in software engineering. While the proliferation of agentic large language models (LLMs) has vastly improved automated repository-level debugging, current frameworks hit a ceiling when dealing with sophisticated bugs like implicit type degradations and complex polymorphic control flows. Because these methods rely heavily on static analysis and superficial execution feedback, they lack visibility into intermediate runtime states. Consequently, agents are forced into costly, speculative trial-and-error loops, wasting computational tokens without successfully isolating the root cause. To bridge this gap, we propose DAIRA (Dynamic Analysis-enhanced Issue Resolution Agent), a pioneering automated repair framework that natively embeds dynamic analysis into the agent's reasoning cycle. Driven by a Test Tracing-Driven methodology, DAIRA utilizes lightweight monitors to extract critical runtime data -- such as variable mutations and call stacks -- and synthesizes them into structured semantic reports. This mechanism fundamentally shifts the agent's behavior from blind guesswork to evidence-based, deterministic deduction. When powered by Gemini 3 Flash Preview, DAIRA establishes a new state-of-the-art (SOTA) performance, achieving a 79.4% resolution rate on the SWE-bench Verified dataset. Compared to existing baselines, our framework not only conquers highly complex defects but also cuts overall inference expenses by roughly 10% and decreases input token consumption by approximately 25%.
翻译:将自然语言描述转化为可行的代码修复仍是软件工程中的根本性挑战。尽管智能体大语言模型(LLMs)的普及显著提升了自动化仓库级调试能力,但现有框架在处理隐性类型退化与复杂多态控制流等精细化缺陷时仍存在瓶颈。由于这些方法过度依赖静态分析和表层执行反馈,缺乏对中间运行时状态的可见性,导致智能体被迫陷入高代价的推测性试错循环,在未能成功定位根因的同时浪费计算令牌。为弥补这一鸿沟,我们提出DAIRA(动态分析增强型问题解决智能体)——一种先驱性自动化修复框架,其核心将动态分析嵌入智能体的推理循环。基于测试追踪驱动方法论,DAIRA利用轻量级监控器提取关键运行时数据(如变量突变与调用栈),并将其整合为结构化语义报告。该机制从根本上将智能体的行为从盲目猜测转变为基于证据的确定性推理。在Gemini 3 Flash Preview驱动下,DAIRA在SWE-bench Verified数据集上达到79.4%的修复率,确立了新的最先进(SOTA)性能。与现有基线相比,我们的框架不仅能攻克高复杂度缺陷,还将整体推理成本降低约10%,输入令牌消耗减少约25%。