Recent advances in large language model (LLM) agents have shown remarkable progress in software issue resolution, leveraging advanced techniques such as multi-agent collaboration and Monte Carlo Tree Search (MCTS). However, current agents act as memoryless explorers - treating each problem separately without retaining or reusing knowledge from previous repair experiences. This leads to redundant exploration of failed trajectories and missed chances to adapt successful issue resolution methods to similar problems. To address this problem, we introduce SWE-Exp, an experience-enhanced approach that distills concise and actionable experience from prior agent trajectories, enabling continuous learning across issues. Our method introduces a multi-faceted experience bank that captures both successful and failed repair attempts. Specifically, it extracts reusable issue resolution knowledge at different levels - from high-level problem comprehension to specific code changes. Experiments show that SWE-Exp achieves a Pass@1 resolution rate of 73.0% on SWE-Bench Verified using the state-of-the-art LLM Claude 4 Sonnet, significantly outperforming prior results under other agent frameworks. Our approach establishes a new paradigm in which automated software engineering agents systematically accumulate and leverage repair expertise, fundamentally shifting from trial-and-error exploration to strategic, experience-driven issue resolution.
翻译:近年来,大型语言模型(LLM)智能体在解决软件问题方面取得了显著进展,这得益于多智能体协作和蒙特卡洛树搜索(MCTS)等先进技术的应用。然而,当前的智能体表现为无记忆的探索者——它们将每个问题视为独立的,既不保留也不复用以往修复经验中的知识。这导致了对失败轨迹的冗余探索,并错失了将成功的问题解决方法适配到类似问题中的机会。为解决这一问题,我们提出了SWE-Exp,一种基于经验增强的方法,它从先前的智能体轨迹中提炼出简洁且可操作的“经验”,实现了跨问题的持续学习。我们的方法引入了一个多维度经验库,用于捕获成功和失败的修复尝试。具体而言,它从不同层面提取可复用的问题解决知识——从高层次的问题理解到具体的代码变更。实验表明,在使用最先进的LLM Claude 4 Sonnet的情况下,SWE-Exp在SWE-Bench Verified数据集上实现了73.0%的Pass@1解决率,显著优于其他智能体框架下的先前结果。我们的方法确立了一种新的范式,即自动化软件工程智能体能够系统地积累并利用修复专业知识,从根本上从试错式探索转向战略性的、基于经验驱动的问题解决。