Immersion in the GitHub Universe: Scaling Coding Agents to Mastery

Achieving mastery in real world software engineering tasks is fundamentally bottlenecked by the scarcity of large scale, high quality training data. Scaling such data has been limited by the complexity of environment setup, unit test generation, and problem statement curation. In this paper, we propose ScaleSWE, an automated, sandboxed multi agent workflow designed to construct high quality SWE data at scale. The system coordinates three specialized agents for environment setup, test creation, and problem description synthesis to process 6 million pull requests across 5200 repositories, producing Scale SWE Data: 100k verified SWE instances, the largest such dataset to date. It substantially surpasses existing real world datasets in repository diversity and reflects realistic task complexity. We further demonstrate the dataset utility for training by distilling 71498 high quality trajectories and finetuning Qwen30BA3BInstruct to produce ScaleSWE Agent. Our agent achieves a 64 resolve rate on SWE Bench Verified a nearly three fold improvement over the base model. ScaleSWE provides a scalable, reproducible approach for data construction to advance LLM based software engineering. Scale SWE will be publicly available.

翻译：在现实世界软件工程任务中实现精通，根本上受限于大规模高质量训练数据的稀缺性。此类数据的规模化构建一直受限于环境配置、单元测试生成和问题描述构建的复杂性。本文提出ScaleSWE，一种自动化、沙盒化的多智能体工作流，旨在规模化构建高质量软件工程数据。该系统协调三个专用智能体分别负责环境配置、测试创建和问题描述合成，处理了5200个代码库中的600万个拉取请求，生成了Scale SWE Data：包含10万个已验证软件工程实例，是迄今规模最大的同类数据集。该数据集在代码库多样性方面显著超越现有现实世界数据集，并反映了真实的任务复杂度。我们进一步通过提炼71498条高质量轨迹数据并微调Qwen30BA3BInstruct模型生成ScaleSWE智能体，验证了数据集的训练效用。该智能体在SWE Bench Verified基准测试中达到64%的解决率，较基础模型提升近三倍。ScaleSWE为推进基于大语言模型的软件工程研究提供了可扩展、可复现的数据构建方法。Scale SWE数据集将公开提供。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

从静态模板到动态运行时图：大语言模型智能体（LLM Agents）工作流优化综述

专知会员服务

19+阅读 · 3月30日

《Hello-Agents》项目正式发布，一起从零学习智能体！

专知会员服务

31+阅读 · 1月2日