In software evolution, resolving the emergent issues within GitHub repositories is a complex challenge that involves not only the incorporation of new code but also the maintenance of existing functionalities. Large Language Models (LLMs) have shown promise in code generation and understanding but face difficulties in code change, particularly at the repository level. To overcome these challenges, we empirically study the reason why LLMs mostly fail to resolve GitHub issues and analyze some impact factors. Motivated by the empirical findings, we propose a novel LLM-based Multi-Agent framework for GitHub Issue reSolution, MAGIS, consisting of four kinds of agents customized for the software evolution: Manager, Repository Custodian, Developer, and Quality Assurance Engineer agents. This framework leverages the collaboration of various agents in the planning and coding process to unlock the potential of LLMs to resolve GitHub issues. In experiments, we employ the SWE-bench benchmark to compare MAGIS with popular LLMs, including GPT-3.5, GPT-4, and Claude-2. MAGIS can resolve 13.94% GitHub issues, which significantly outperforms the baselines. Specifically, MAGIS achieves an eight-fold increase in resolved ratio over the direct application of GPT-4, the based LLM of our method. We also analyze the factors for improving GitHub issue resolution rates, such as line location, task allocation, etc.
翻译:在软件演化中,解决GitHub仓库中涌现的问题是复杂挑战,这不仅涉及新代码的整合,还需维护现有功能。大型语言模型(LLM)在代码生成与理解方面展现出潜力,但在代码变更(尤其是仓库级变更)中面临困难。为克服这些挑战,我们通过实证研究探讨了LLM未能解决GitHub问题的根本原因,并分析了若干影响因素。基于实证结果,我们提出一种新颖的基于LLM的多智能体框架MAGIS,其包含四类为软件演化定制的智能体:管理者、仓库管理员、开发者与质量保证工程师。该框架通过规划与编码过程中多智能体的协作,释放LLM解决GitHub问题的潜力。实验中,我们采用SWE-bench基准将MAGIS与主流LLM(包括GPT-3.5、GPT-4和Claude-2)进行对比。MAGIS可解决13.94%的GitHub问题,显著优于基线方法。具体而言,相较于直接应用其基础LLM(GPT-4),MAGIS的问题解决率提升达八倍。我们还分析了提升GitHub问题解决率的因素,如代码行定位、任务分配等。