In software development, resolving the emergent issues within GitHub repositories is a complex challenge that involves not only the incorporation of new code but also the maintenance of existing code. Large Language Models (LLMs) have shown promise in code generation but face difficulties in resolving Github issues, particularly at the repository level. To overcome this challenge, we empirically study the reason why LLMs fail to resolve GitHub issues and analyze the major factors. Motivated by the empirical findings, we propose a novel LLM-based Multi-Agent framework for GitHub Issue reSolution, MAGIS, consisting of four agents customized for software evolution: Manager, Repository Custodian, Developer, and Quality Assurance Engineer agents. This framework leverages the collaboration of various agents in the planning and coding process to unlock the potential of LLMs to resolve GitHub issues. In experiments, we employ the SWE-bench benchmark to compare MAGIS with popular LLMs, including GPT-3.5, GPT-4, and Claude-2. MAGIS can resolve 13.94% GitHub issues, significantly outperforming the baselines. Specifically, MAGIS achieves an eight-fold increase in resolved ratio over the direct application of GPT-4, the advanced LLM.
翻译:在软件开发中,解决GitHub仓库中涌现的问题是一项复杂的挑战,不仅涉及新代码的集成,还包括现有代码的维护。大型语言模型(LLMs)在代码生成方面展现出潜力,但在解决GitHub问题,尤其是仓库级别的问题时面临困难。为克服这一挑战,我们实证研究了LLMs未能解决GitHub问题的原因,并分析了主要因素。基于实证发现,我们提出了一种新颖的基于LLM的多智能体框架用于GitHub问题解决(MAGIS),该框架包含四个为软件演化定制的智能体:管理者、仓库管理员、开发者和质量保证工程师智能体。该框架利用各智能体在规划和编码过程中的协作,释放LLMs解决GitHub问题的潜力。在实验中,我们采用SWE-bench基准测试,将MAGIS与主流LLMs(包括GPT-3.5、GPT-4和Claude-2)进行比较。MAGIS能够解决13.94%的GitHub问题,显著优于基线方法。具体而言,MAGIS的解决率比直接应用先进LLM GPT-4提高了八倍。