Large language models have demonstrated strong capabilities in individual software engineering tasks, yet most autonomous systems still treat issue resolution as a monolithic or pipeline-based process. In contrast, real-world software development is organized as a collaborative activity carried out by teams following shared methodologies, with clear role separation, communication, and review. In this work, we present a fully automated multi-agent system that explicitly models software engineering as an organizational process, replicating the structure of an engineering team. Built on top of agyn, an open-source platform for configuring agent teams, our system assigns specialized agents to roles such as coordination, research, implementation, and review, provides them with isolated sandboxes for experimentation, and enables structured communication. The system follows a defined development methodology for working on issues, including analysis, task specification, pull request creation, and iterative review, and operates without any human intervention. Importantly, the system was designed for real production use and was not tuned for SWE-bench. When evaluated post hoc on SWE-bench 500, it resolves 72.4% of tasks, outperforming single-agent baselines using comparable language models. Our results suggest that replicating team structure, methodology, and communication is a powerful paradigm for autonomous software engineering, and that future progress may depend as much on organizational design and agent infrastructure as on model improvements.
翻译:大型语言模型已在个体软件工程任务中展现出强大能力,然而多数自主系统仍将问题解决视为单一或流水线式的过程。相比之下,现实中的软件开发是遵循共享方法论、具有明确角色分工、沟通与评审机制的团队协作活动。本研究提出一个完全自动化的多智能体系统,其将软件工程显式建模为组织化流程,复现了工程团队的结构。该系统基于开源智能体团队配置平台agyn构建,为协调、研究、实现与评审等角色分配专用智能体,提供隔离沙箱供实验验证,并支持结构化通信。系统遵循既定的开发方法论处理问题,包括分析、任务规划、拉取请求创建与迭代评审等环节,全程无需人工干预。值得注意的是,本系统专为实际生产环境设计,未针对SWE-bench进行调优。在SWE-bench 500数据集的事后评估中,系统以72.4%的任务解决率超越了使用同等语言模型的单智能体基线。研究结果表明:复现团队结构、方法论与沟通机制是自主软件工程的有效范式,未来的进展可能不仅取决于模型改进,更依赖于组织设计与智能体基础设施的协同发展。