InspectCoder：通过交互式LLM-调试器协作实现动态分析赋能的自我修复 (InspectCoder: Dynamic Analysis-Enabled Self Repair through interactive LLM-Debugger Collaboration)

Large Language Models (LLMs) frequently generate buggy code with complex logic errors that are challenging to diagnose. While existing LLM-based self-repair approaches conduct intensive static semantic analysis or reply on superficial execution logs, they miss the in-depth runtime behaviors that often expose bug root causes-lacking the interactive dynamic analysis capabilities that make human debugging effective. We present InspectCoder, the first agentic program repair system that empowers LLMs to actively conduct dynamic analysis via interactive debugger control. Our dual-agent framework enables strategic breakpoint placement, targeted state inspection, and incremental runtime experimentation within stateful debugger sessions. Unlike existing methods that follow fixed log collection procedures, InspectCoder adaptively inspects and perturbs relevant intermediate states at runtime, and leverages immediate process rewards from debugger feedback to guide multi-step reasoning, transforming LLM debugging paradigm from blind trial-and-error into systematic root cause diagnosis. We conduct comprehensive experiments on two challenging self-repair benchmarks: BigCodeBench-R and LiveCodeBench-R. InspectCoder achieves 5.10%-60.37% relative improvements in repair accuracy over the strongest baseline, while delivering 1.67x-2.24x superior bug-fix efficiency respectively. We also contribute InspectWare, an open-source middleware that abstracts debugger complexities and maintains stateful debugging sessions across mainstream Python testing frameworks. Our work provides actionable insight into the interactive LLM-debugger systems, demonstrating the significant potential of LLM-driven dynamic analysis for automated software engineering.

翻译：大型语言模型（LLMs）生成的代码常包含复杂逻辑错误，这些错误诊断起来极具挑战。现有的基于LLM的自我修复方法虽进行密集的静态语义分析或依赖浅层执行日志，却遗漏了常能揭示缺陷根本原因的深层运行时行为——缺乏使人类调试有效的那种交互式动态分析能力。我们提出InspectCoder，首个赋能LLM通过交互式调试器控制主动执行动态分析的智能程序修复系统。我们的双智能体框架支持在有状态的调试会话中进行策略性断点设置、针对性状态检查与增量式运行时实验。与遵循固定日志收集流程的现有方法不同，InspectCoder在运行时自适应地检查并扰动相关中间状态，并利用来自调试器反馈的即时过程奖励来引导多步推理，从而将LLM调试范式从盲目的试错转变为系统性的根本原因诊断。我们在两个具有挑战性的自我修复基准测试（BigCodeBench-R与LiveCodeBench-R）上进行了全面实验。InspectCoder相比最强基线在修复准确率上实现了5.10%-60.37%的相对提升，同时在错误修复效率上分别达到1.67倍至2.24倍的显著优势。我们还贡献了InspectWare，一个开源中间件，它抽象了调试器的复杂性，并在主流Python测试框架中维护有状态的调试会话。我们的工作为交互式LLM-调试器系统提供了可操作的见解，展示了LLM驱动的动态分析在自动化软件工程领域的巨大潜力。