Language model (LM) agents are increasingly being used to automate complicated tasks in digital environments. Just as humans benefit from powerful software applications, such as integrated development environments, for complex tasks like software engineering, we posit that LM agents represent a new category of end users with their own needs and abilities, and would benefit from specially-built interfaces to the software they use. We investigate how interface design affects the performance of language model agents. As a result of this exploration, we introduce SWE-agent: a system that facilitates LM agents to autonomously use computers to solve software engineering tasks. SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs. We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively, far exceeding the previous state-of-the-art achieved with non-interactive LMs. Finally, we provide insight on how the design of the ACI can impact agents' behavior and performance.
翻译:语言模型(LM)智能体正日益被用于自动化数字环境中的复杂任务。正如人类在进行软件工程等复杂任务时受益于集成开发环境等强大的软件应用程序,我们认为LM智能体代表了一类具有自身需求与能力的新型终端用户,它们将受益于为其所使用的软件专门构建的接口。我们研究了接口设计如何影响语言模型智能体的性能。基于此项探索,我们推出了SWE-agent:一个促进LM智能体自主使用计算机解决软件工程任务的系统。SWE-agent定制的智能体-计算机接口(ACI)显著增强了智能体创建和编辑代码文件、浏览整个代码仓库以及执行测试与其他程序的能力。我们在SWE-bench和HumanEvalFix基准上评估SWE-agent,分别以12.5%和87.7%的pass@1率实现了最先进的性能,远超此前非交互式语言模型所达到的最佳水平。最后,我们深入分析了ACI的设计如何影响智能体的行为与性能。