SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

Language model (LM) agents are increasingly being used to automate complicated tasks in digital environments. Just as humans benefit from powerful software applications, such as integrated development environments, for complex tasks like software engineering, we posit that LM agents represent a new category of end users with their own needs and abilities, and would benefit from specially-built interfaces to the software they use. We investigate how interface design affects the performance of language model agents. As a result of this exploration, we introduce SWE-agent: a system that facilitates LM agents to autonomously use computers to solve software engineering tasks. SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs. We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively, far exceeding the previous state-of-the-art achieved with non-interactive LMs. Finally, we provide insight on how the design of the ACI can impact agents' behavior and performance.

翻译：语言模型（LM）智能体正日益被用于自动化数字环境中的复杂任务。正如人类在软件工程等复杂任务中受益于集成开发环境等强大的软件应用程序，我们认为LM智能体代表了一类具有自身需求和能力的新型终端用户，它们将受益于为其所使用的软件专门构建的接口。我们研究了接口设计如何影响语言模型智能体的性能。基于此探索，我们引入了SWE-agent：一个促进LM智能体自主使用计算机解决软件工程任务的系统。SWE-agent定制的智能体-计算机接口（ACI）显著增强了智能体创建和编辑代码文件、浏览整个代码仓库以及执行测试和其他程序的能力。我们在SWE-bench和HumanEvalFix基准上评估SWE-agent，在两个基准上均实现了最先进的性能，其pass@1率分别达到12.5%和87.7%，远超此前非交互式语言模型所达到的最佳水平。最后，我们深入探讨了ACI的设计如何影响智能体的行为与性能。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日