SGAgent: Suggestion-Guided LLM-Based Multi-Agent Framework for Repository-Level Software Repair

Large Language Models (LLMs) have enabled intelligent agents that autonomously interact with environments and invoke external tools. Recently, agent-based software repair has drawn wide attention, as repair agents can localize bugs, generate patches, and achieve state-of-the-art performance on repository-level benchmarks (e.g., SWE-Bench). However, existing approaches usually adopt a localize-then-fix paradigm, jumping directly from "where the bug is" to "how to fix it", leaving a fundamental reasoning gap. To this end, we propose SGAgent, a Suggestion-Guided multi-Agent framework for repository-level software repair, which follows a localize-suggest-fix paradigm. SGAgent introduces a suggestion phase to strengthen the transition from localization to repair: the suggester starts from the buggy locations, incrementally retrieves relevant context until it fully understands the bug, and provides actionable repair suggestions. We further construct a Knowledge Graph (KG) from the target repository and develop a KG-based toolkit to strengthen SGAgent's global contextual awareness and repository-level reasoning. Three specialized sub-agents (i.e., localizer, suggester, and fixer) collaborate to achieve automated end-to-end software repair. We evaluate SGAgent on SWE-Bench-Lite. SGAgent with Claude-3.5 achieves 51.3% repair accuracy, 81.2% file-level, and 52.4% function-level localization accuracy at an average cost of $1.48 per instance, outperforming all baselines using the same base model. SGAgent also generalizes well across base LLMs, reaching a 60.7% resolution rate with Claude-4. When extended to vulnerability repair, it achieves 48.0% on VUL4J and VJBench, demonstrating strong generalization across tasks and programming languages.

翻译：大语言模型（LLM）使智能体能够自主与环境交互并调用外部工具。近年来，基于智能体的软件修复受到广泛关注，因为修复智能体可以定位漏洞、生成补丁，并在仓库级基准测试（如SWE-Bench）上取得领先性能。然而，现有方法通常采用"定位-修复"范式，直接从"漏洞在哪里"跳转到"如何修复"，存在根本性的推理鸿沟。为此，我们提出SGAgent——一种基于建议引导的多智能体框架用于仓库级软件修复，遵循"定位-建议-修复"范式。SGAgent引入建议阶段以强化从定位到修复的过渡：建议者从漏洞位置出发，逐步检索相关上下文直至完全理解漏洞，并提供可操作的修复建议。我们进一步从目标仓库构建知识图谱（KG），并开发基于KG的工具包以增强SGAgent的全局上下文感知能力和仓库级推理能力。三个专门子智能体（即定位器、建议者和修复者）协同工作，实现自动化端到端软件修复。我们在SWE-Bench-Lite上评估SGAgent。使用Claude-3.5的SGAgent达到51.3%的修复准确率，81.2%的文件级和52.4%的函数级定位准确率，平均每个实例成本为1.48美元，在使用相同基础模型的所有基线中表现最佳。SGAgent在不同基础LLM上具有良好的泛化能力，使用Claude-4时达到60.7%的解决率。当扩展到漏洞修复时，它在VUL4J和VJBench上达到48.0%，展示了跨任务和编程语言的强大泛化能力。