Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recently, researchers and industry practitioners have developed various autonomous LLM agents to perform end-to-end software development tasks. These agents are equipped with the ability to use tools, run commands, observe feedback from the environment, and plan for future actions. However, the complexity of these agent-based approaches, together with the limited abilities of current LLMs, raises the following question: Do we really have to employ complex autonomous software agents? To attempt to answer this question, we build Agentless -- an agentless approach to automatically solve software development problems. Compared to the verbose and complex setup of agent-based approaches, Agentless employs a simplistic two-phase process of localization followed by repair, without letting the LLM decide future actions or operate with complex tools. Our results on the popular SWE-bench Lite benchmark show that surprisingly the simplistic Agentless is able to achieve both the highest performance (27.33%) and lowest cost (\$0.34) compared with all existing open-source software agents! Furthermore, we manually classified the problems in SWE-bench Lite and found problems with exact ground truth patch or insufficient/misleading issue descriptions. As such, we construct SWE-bench Lite-S by excluding such problematic issues to perform more rigorous evaluation and comparison. Our work highlights the current overlooked potential of a simple, interpretable technique in autonomous software development. We hope Agentless will help reset the baseline, starting point, and horizon for autonomous software agents, and inspire future work along this crucial direction.
翻译:近年来,大语言模型(LLMs)的进展显著推动了软件开发任务的自动化进程,包括代码合成、程序修复与测试生成。最近,学术界与工业界的研究者开发了多种自主LLM智能体以执行端到端的软件开发任务。这些智能体具备使用工具、运行命令、观察环境反馈及规划后续行动的能力。然而,这类基于智能体的方法复杂度较高,加之当前大语言模型能力有限,引发出一个根本性问题:我们是否真的需要采用复杂的自主软件智能体?为探索此问题,我们构建了Agentless——一种无需智能体的自动化软件开发问题求解方法。相较于基于智能体方法冗长复杂的配置,Agentless采用定位后修复的简约两阶段流程,无需大语言模型决策后续行动或操作复杂工具。我们在主流基准SWE-bench Lite上的实验表明:令人惊讶的是,简约的Agentless竟能取得最高性能(27.33%)与最低成本(0.34美元),超越所有现有开源软件智能体!此外,我们通过人工分类SWE-bench Lite中的问题,发现了存在精确真值补丁或问题描述不充分/误导性的案例。据此,我们通过排除此类问题构建了SWE-bench Lite-S基准,以进行更严谨的评估与比较。本研究揭示了当前自主软件开发领域中被忽视的简约可解释技术的潜力。我们希望Agentless能帮助重置自主软件智能体的基线、起点与视野,并激励这一关键方向的未来研究。