Recent advancements in LLM-based agents have led to significant progress in automatic software engineering, particularly in software maintenance and evolution. Despite these encouraging advances, current research faces two major challenges. First, SOTA performance primarily depends on closed-source models, which significantly limits the technology's accessibility, and potential for customization in diverse SE tasks. Second, these models are predominantly trained on static code data, lacking a deep understanding of the dynamic interactions, iterative problem-solving processes, and evolutionary characteristics inherent in software development. To address these challenges, our study adopts a software engineering perspective. We recognize that real-world software maintenance and evolution processes encompass not only static code data but also developers' thought processes, utilization of external tools, and the interaction between different functional personnel. Consequently, we introduce the Lingma SWE-GPT series, comprising Lingma SWE-GPT 7B and 72B. By learning from and simulating real-world code submission activities, Lingma SWE-GPT systematically incorporates the dynamic interactions and iterative problem-solving inherent in software development process, thereby achieving a more comprehensive understanding of software improvement processes. We conducted experimental evaluations using SWE-bench Verified benchmark. The results demonstrate that Lingma SWE-GPT 72B successfully resolves 30.20% of the GitHub issues, marking a significant improvement in automatic issue resolution (22.76% relative improvement compared to Llama 3.1 405B), approaching the performance of closed-source models (31.80\% issues of GPT-4o resolved). Notably, Lingma SWE-GPT 7B resolves 18.20% of the issues, highlighting the potential for applying smaller models to ASE tasks.
翻译:基于大语言模型的智能体技术的最新进展,在自动化软件工程领域,特别是在软件维护与演化方面,取得了显著进步。尽管这些进展令人鼓舞,但当前研究仍面临两大挑战。首先,最先进的性能主要依赖于闭源模型,这极大地限制了该技术的可及性以及在多样化软件工程任务中的定制潜力。其次,这些模型主要基于静态代码数据进行训练,缺乏对软件开发过程中固有的动态交互、迭代问题解决过程以及演化特性的深入理解。为应对这些挑战,本研究从软件工程视角出发。我们认识到,现实世界的软件维护与演化过程不仅包含静态代码数据,还涵盖开发者的思维过程、外部工具的使用以及不同职能人员之间的交互。因此,我们推出了Lingma SWE-GPT系列模型,包括Lingma SWE-GPT 7B和72B。通过学习并模拟现实世界的代码提交活动,Lingma SWE-GPT系统地融入了软件开发过程中固有的动态交互与迭代问题解决特性,从而实现了对软件改进过程更全面的理解。我们使用SWE-bench Verified基准进行了实验评估。结果表明,Lingma SWE-GPT 72B成功解决了30.20%的GitHub问题,在自动化问题解决方面取得了显著提升(相较于Llama 3.1 405B实现了22.76%的相对改进),其性能已接近闭源模型(GPT-4o解决了31.80%的问题)。值得注意的是,Lingma SWE-GPT 7B解决了18.20%的问题,凸显了将较小规模模型应用于自动化软件工程任务的潜力。