Tremendous efforts have been devoted to automating software debugging, a time-consuming process involving fault localization and repair generation. Recently, Large Language Models (LLMs) have shown great potential in automated debugging. However, we identified three challenges posed to traditional and LLM-based debugging tools: 1) the upstream imperfection of fault localization affects the downstream repair, 2) the deficiency in handling complex logic errors, and 3) the ignorance of program contexts. In this context, we propose the first automated, unified debugging framework, FixAgent, via LLM agent synergy. FixAgent can perform end-to-end localization, repair, and analysis of bugs. Our insight is that LLMs can benefit from general software engineering principles recognized by human developers in debugging, such as rubber duck debugging, enabling a better understanding of program functionality and logic bugs. Hence, we create three designs inspired by rubber ducking to address these challenges. They are agent specialization and synergy, key variable tracking, and program context comprehension, which request LLMs to provide explicit explanations and force them to focus on crucial program logic information. Experiments on the widely used dataset QuixBugs show that FixAgent correctly fixes 79 out of 80 bugs, 9 of which have never been fixed. It also plausibly patches 1.9X more defects than the best-performing repair tool on CodeFlaws, even with no bug location information and fewer than 0.6% sampling times. On average, FixAgent increases about 20% plausible and correct fixes compared to its base model using different LLMs, showing the effectiveness of our designs. Moreover, the correctness rate of FixAgent reaches remarkably 97.26%, indicating that FixAgent can potentially overcome the overfitting issue of the existing approaches.
翻译:大量工作致力于自动化软件调试,这一耗时过程涉及故障定位与修复生成。近年来,大型语言模型(LLMs)在自动化调试中展现出巨大潜力。然而,我们识别出传统与基于LLM的调试工具面临的三大挑战:1)故障定位的上游不完善影响下游修复,2)处理复杂逻辑错误的能力不足,以及3)忽略程序上下文。在此背景下,我们提出首个自动化统一调试框架FixAgent,通过LLM智能体协同实现端到端的错误定位、修复与分析。我们的洞察在于,LLM能从人类开发者认可的通用软件工程原则(如橡皮鸭调试法)中获益,从而更好地理解程序功能与逻辑错误。因此,我们借鉴橡皮鸭调试法设计了三种方案应对这些挑战,包括智能体专业化与协同、关键变量追踪以及程序上下文理解,这些方案要求LLM提供显式解释并强制其聚焦关键程序逻辑信息。在广泛使用的QuixBugs数据集上的实验表明,FixAgent正确修复了80个错误中的79个,其中9个此前从未被修复。即使无任何错误位置信息且采样时间低于0.6%,FixAgent在CodeFlaws数据集上的缺陷修复数量仍比最佳修复工具高出1.9倍。平均而言,FixAgent在使用不同LLM的基础模型上提升了约20%的合理修复与正确修复,验证了我们设计的有效性。此外,FixAgent的正确率高达97.26%,表明其有潜力克服现有方法的过度拟合问题。