Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement

Recent advancements in LLM-based agents have led to significant progress in automatic software engineering, particularly in software maintenance and evolution. Despite these encouraging advances, current research faces two major challenges. First, SOTA performance primarily depends on closed-source models, which significantly limits the technology's accessibility, and potential for customization in diverse SE tasks. Second, these models are predominantly trained on static code data, lacking a deep understanding of the dynamic interactions, iterative problem-solving processes, and evolutionary characteristics inherent in software development. To address these challenges, our study adopts a software engineering perspective. We recognize that real-world software maintenance and evolution processes encompass not only static code data but also developers' thought processes, utilization of external tools, and the interaction between different functional personnel. Consequently, we introduce the Lingma SWE-GPT series, comprising Lingma SWE-GPT 7B and 72B. By learning from and simulating real-world code submission activities, Lingma SWE-GPT systematically incorporates the dynamic interactions and iterative problem-solving inherent in software development process, thereby achieving a more comprehensive understanding of software improvement processes. We conducted experimental evaluations using SWE-bench Verified benchmark. The results demonstrate that Lingma SWE-GPT 72B successfully resolves 30.20% of the GitHub issues, marking a significant improvement in automatic issue resolution (22.76% relative improvement compared to Llama 3.1 405B), approaching the performance of closed-source models (31.80\% issues of GPT-4o resolved). Notably, Lingma SWE-GPT 7B resolves 18.20% of the issues, highlighting the potential for applying smaller models to ASE tasks.

翻译：基于大语言模型的智能体技术的最新进展，在自动化软件工程领域，特别是在软件维护与演化方面，取得了显著进步。尽管这些进展令人鼓舞，但当前研究仍面临两大挑战。首先，最先进的性能主要依赖于闭源模型，这极大地限制了该技术的可及性以及在多样化软件工程任务中的定制潜力。其次，这些模型主要基于静态代码数据进行训练，缺乏对软件开发过程中固有的动态交互、迭代问题解决过程以及演化特性的深入理解。为应对这些挑战，本研究从软件工程视角出发。我们认识到，现实世界的软件维护与演化过程不仅包含静态代码数据，还涵盖开发者的思维过程、外部工具的使用以及不同职能人员之间的交互。因此，我们推出了Lingma SWE-GPT系列模型，包括Lingma SWE-GPT 7B和72B。通过学习并模拟现实世界的代码提交活动，Lingma SWE-GPT系统地融入了软件开发过程中固有的动态交互与迭代问题解决特性，从而实现了对软件改进过程更全面的理解。我们使用SWE-bench Verified基准进行了实验评估。结果表明，Lingma SWE-GPT 72B成功解决了30.20%的GitHub问题，在自动化问题解决方面取得了显著提升（相较于Llama 3.1 405B实现了22.76%的相对改进），其性能已接近闭源模型（GPT-4o解决了31.80%的问题）。值得注意的是，Lingma SWE-GPT 7B解决了18.20%的问题，凸显了将较小规模模型应用于自动化软件工程任务的潜力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/