Current repository agents encounter a reasoning disconnect due to fragmented representations, as existing methods rely on isolated API documentation or dependency graphs that lack semantic depth. We consider repository comprehension and generation to be inverse processes within a unified cycle: generation expands intent into implementation, while comprehension compresses implementation back into intent. To address this, we propose RPG-Encoder, a framework that generalizes the Repository Planning Graph (RPG) from a static generative blueprint into a unified, high-fidelity representation. RPG-Encoder closes the reasoning loop through three mechanisms: (1) Encoding raw code into the RPG that combines lifted semantic features with code dependencies; (2) Evolving the topology incrementally to decouple maintenance costs from repository scale, reducing overhead by 95.7%; and (3) Operating as a unified interface for structure-aware navigation. In evaluations, RPG-Encoder establishes state-of-the-art localization performance on SWE-bench Verified with 93.7% Acc@5 and exceeds the best baseline by over 10% in localization accuracy on SWE-bench Live Lite. These results highlight our superior fine-grained precision in complex codebases. Furthermore, it achieves 98.5% reconstruction coverage on RepoCraft, confirming RPG's high-fidelity capacity to mirror the original codebase and closing the loop between intent and implementation.
翻译:当前代码库智能体因表示形式碎片化而面临推理脱节问题,现有方法依赖于孤立的API文档或缺乏语义深度的依赖图。我们认为代码库理解与生成是统一循环中的两个互逆过程:生成将意图扩展为具体实现,而理解则将实现压缩回意图。为此,我们提出RPG-Encoder框架,将Repository Planning Graph(RPG)从静态生成蓝图推广为统一的高保真表示。RPG-Encoder通过三种机制实现推理闭环:(1)将原始代码编码为融合语义特征与代码依赖的RPG;(2)通过增量式拓扑演化,使维护成本与代码库规模解耦,实现95.7%的开销降低;(3)作为统一接口支持结构感知导航。在评估中,RPG-Encoder在SWE-bench Verified上以93.7%的Acc@5指标达到最先进的定位性能,并在SWE-bench Live Lite的定位准确率上超越最佳基线超过10%。这些结果凸显了我们在复杂代码库中卓越的细粒度精度。此外,该方法在RepoCraft上实现98.5%的重建覆盖率,证实了RPG镜像原始代码库的高保真能力,从而完成了意图与实现之间的闭环。