Prior works on training software engineering agents have explored utilizing existing resources such as issues on GitHub repositories to construct software engineering tasks and corresponding test suites. These approaches face two key limitations: (1) their reliance on pre-existing GitHub repositories offers limited flexibility, and (2) their primary focus on issue resolution tasks restricts their applicability to the much wider variety of tasks a software engineer must handle. To overcome these challenges, we introduce SWE-Playground, a novel pipeline for generating environments and trajectories which supports the training of versatile coding agents. Unlike prior efforts, SWE-Playground synthetically generates projects and tasks from scratch with strong language models and agents, eliminating reliance on external data sources. This allows us to tackle a much wider variety of coding tasks, such as reproducing issues by generating unit tests and implementing libraries from scratch. We demonstrate the effectiveness of this approach on three distinct benchmarks, and results indicate that SWE-Playground produces trajectories with dense training signal, enabling agents to reach comparable performance with significantly fewer trajectories than previous works.
翻译:先前关于训练软件工程智能体的研究探索了利用现有资源(如GitHub仓库中的问题)来构建软件工程任务及相应的测试套件。这些方法面临两个关键局限:(1)其对预存GitHub仓库的依赖导致灵活性受限;(2)其以问题解决任务为主的关注点限制了其适用于软件工程师需处理的更广泛任务类型。为克服这些挑战,我们提出了SWE-Playground——一种支持训练多功能编码智能体的新型环境与轨迹生成流程。与先前工作不同,SWE-Playground通过强语言模型和智能体从零开始合成生成项目与任务,消除了对外部数据源的依赖。这使得我们能够处理更广泛的编码任务,例如通过生成单元测试复现问题,以及从零开始实现代码库。我们在三个独立基准测试中验证了该方法的有效性,结果表明SWE-Playground生成的轨迹具有密集训练信号,使智能体仅需远少于先前工作的轨迹量即可达到相当的性能水平。