SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

Language model (LM) agents are increasingly being used to automate complicated tasks in digital environments. Just as humans benefit from powerful software applications, such as integrated development environments, for complex tasks like software engineering, we posit that LM agents represent a new category of end users with their own needs and abilities, and would benefit from specially-built interfaces to the software they use. We investigate how interface design affects the performance of language model agents. As a result of this exploration, we introduce SWE-agent: a system that facilitates LM agents to autonomously use computers to solve software engineering tasks. SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs. We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively, far exceeding the previous state-of-the-art achieved with non-interactive LMs. Finally, we provide insight on how the design of the ACI can impact agents' behavior and performance.

翻译：语言模型（LM）智能体正日益被用于自动化数字环境中的复杂任务。正如人类在进行软件工程等复杂任务时受益于集成开发环境等强大的软件应用程序，我们认为LM智能体代表了一类具有自身需求与能力的新型终端用户，它们将受益于为其所使用的软件专门构建的接口。我们研究了接口设计如何影响语言模型智能体的性能。基于此项探索，我们推出了SWE-agent：一个促进LM智能体自主使用计算机解决软件工程任务的系统。SWE-agent定制的智能体-计算机接口（ACI）显著增强了智能体创建和编辑代码文件、浏览整个代码仓库以及执行测试与其他程序的能力。我们在SWE-bench和HumanEvalFix基准上评估SWE-agent，分别以12.5%和87.7%的pass@1率实现了最先进的性能，远超此前非交互式语言模型所达到的最佳水平。最后，我们深入分析了ACI的设计如何影响智能体的行为与性能。

相关内容

Engineering

关注 7

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日