Large language models excel at generating individual functions or single files of code, yet generating complete repositories from scratch remains a fundamental challenge. This capability is key to building coherent software systems from high-level specifications and realizing the full potential of automated code generation. The process requires planning at two levels: deciding what features and modules to build (proposal stage) and defining their implementation details (implementation stage). Current approaches rely on natural language planning, which often produces unclear specifications, misaligned components, and brittle designs due to its inherent ambiguity and lack of structure. To address these limitations, we introduce the Repository Planning Graph (RPG), a structured representation that encodes capabilities, file structures, data flows, and functions in a unified graph. By replacing free-form natural language with an explicit blueprint, RPG enables consistent long-horizon planning for repository generation. Building on RPG, we develop ZeroRepo, a graph-driven framework that operates in three stages: proposal-level planning, implementation-level construction, and graph-guided code generation with test validation. To evaluate, we construct RepoCraft, a benchmark of six real-world projects with 1,052 tasks. On RepoCraft, ZeroRepo produces nearly 36K Code Lines and 445K Code Tokens, on average 3.9$\times$ larger than the strongest baseline (Claude Code), and 68$\times$ larger than other baselines. It achieves 81.5% coverage and 69.7% test accuracy, improving over Claude Code by 27.3 and 35.8 points. Further analysis shows that RPG models complex dependencies, enables more sophisticated planning through near-linear scaling, and improves agent understanding of repositories, thus accelerating localization. Our data and code are available at https://github.com/microsoft/RPG-ZeroRepo.
翻译:大型语言模型擅长生成独立函数或单个代码文件,但从零开始生成完整代码仓库仍是一项根本性挑战。此能力是从高层规范构建连贯软件系统、实现自动化代码生成全部潜力的关键。该过程需要两个层面的规划:决定构建哪些功能与模块(提案阶段)以及定义其实现细节(实现阶段)。现有方法依赖自然语言规划,由于其固有的模糊性和缺乏结构性,常产生不明确的规范、错位的组件和脆弱的设计。为突破这些限制,我们提出了仓库规划图(RPG)——一种在统一图结构中编码能力、文件结构、数据流和函数的结构化表示。通过用显式蓝图替代自由形式的自然语言,RPG实现了代码库生成中一致的长周期规划。基于RPG,我们开发了ZeroRepo,这是一个三阶段运行的图驱动框架:提案级规划、实现级构建以及通过图引导的代码生成与测试验证。为进行评估,我们构建了RepoCraft基准测试集,包含六个实际项目共1,052项任务。在RepoCraft上,ZeroRepo生成了近36K行代码和445K代码标记,平均规模分别是最强基线(Claude Code)的3.9倍和其他基线的68倍。其实现了81.5%的覆盖率与69.7%的测试准确率,较Claude Code分别提升27.3和35.8个百分点。进一步分析表明,RPG能建模复杂依赖关系,通过近线性扩展实现更精细的规划,并提升智能体对代码仓库的理解能力,从而加速问题定位。我们的数据与代码公开于 https://github.com/microsoft/RPG-ZeroRepo。