Large Language Models (LLMs) have revolutionized software engineering (SE), showcasing remarkable proficiency in various coding tasks. Despite recent advancements that have enabled the creation of autonomous software agents utilizing LLMs for end-to-end development tasks, these systems are typically designed for specific SE functions. We introduce HyperAgent, an innovative generalist multi-agent system designed to tackle a wide range of SE tasks across different programming languages by mimicking the workflows of human developers. HyperAgent features four specialized agents-Planner, Navigator, Code Editor, and Executor-capable of handling the entire lifecycle of SE tasks, from initial planning to final verification. HyperAgent sets new benchmarks in diverse SE tasks, including GitHub issue resolution on the renowned SWE-Bench benchmark, outperforming robust baselines. Furthermore, HyperAgent demonstrates exceptional performance in repository-level code generation (RepoExec) and fault localization and program repair (Defects4J), often surpassing state-of-the-art baselines.
翻译:大型语言模型(LLMs)已彻底变革软件工程领域,在各种编码任务中展现出卓越能力。尽管近期进展已能利用LLMs创建实现端到端开发任务的自主软件智能体,但这些系统通常专为特定软件工程功能而设计。本文提出HyperAgent——一种创新的通用多智能体系统,通过模拟人类开发者工作流程,旨在处理跨编程语言的广泛软件工程任务。HyperAgent包含四个专用智能体:规划器、导航器、代码编辑器和执行器,能够处理从初始规划到最终验证的软件工程任务全生命周期。在多样化软件工程任务中,HyperAgent创造了新的性能基准,包括在著名的SWE-Bench基准测试中的GitHub问题解决任务,其表现优于现有强基线系统。此外,HyperAgent在仓库级代码生成(RepoExec)以及缺陷定位与程序修复(Defects4J)任务中展现出卓越性能,经常超越最先进的基线方法。