Repository aware coding agents often struggle to recover build and test structure, especially in multilingual projects where cross language dependencies are encoded across heterogeneous build systems and tooling. We introduce the Repository Intelligence Graph (RIG), a deterministic, evidence backed architectural map that represents buildable components, aggregators, runners, tests, external packages, and package managers, connected by explicit dependency and coverage edges that trace back to concrete build and test definitions. We also present SPADE, a deterministic extractor that constructs RIG from build and test artifacts (currently with an automatic CMake plugin based on the CMake File API and CTest metadata), and exposes RIG as an LLM friendly JSON view that agents can treat as the authoritative description of repository structure. We evaluate three commercial agents (Claude Code, Cursor, Codex) on eight repositories spanning low to high build oriented complexity, including the real world MetaFFI project. Each agent answers thirty structured questions per repository with and without RIG in context, and we measure accuracy, wall clock completion time, and efficiency (seconds per correct answer). Across repositories and agents, providing RIG improves mean accuracy by 12.2\% and reduces completion time by 53.9\%, yielding a mean 57.8\% reduction in seconds per correct answer. Gains are larger in multilingual repositories, which improve by 17.7\% in accuracy and 69.5\% in efficiency on average, compared to 6.6\% and 46.1\% in single language repositories. Qualitative analysis suggests that RIG shifts failures from structural misunderstandings toward reasoning mistakes over a correct structure, while rare regressions highlight that graph based reasoning quality remains a key factor.
翻译:具备仓库感知能力的编码智能体在恢复构建与测试结构时常面临困难,尤其在多语言项目中,跨语言依赖关系被编码在异构的构建系统与工具链中。本文提出仓库智能图谱(Repository Intelligence Graph, RIG),这是一种基于证据的确定性架构映射,能够表征可构建组件、聚合器、运行器、测试用例、外部软件包及包管理器,并通过显式的依赖关系边与覆盖关系边连接至具体的构建与测试定义。同时,我们提出SPADE——一种确定性提取器,能够从构建与测试产物(当前支持基于CMake File API与CTest元数据的自动CMake插件)中构建RIG,并将RIG以LLM友好的JSON视图形式呈现,使智能体可将其视为仓库结构的权威描述。我们在八个构建复杂度由低至高的仓库(包括真实项目MetaFFI)上评估了三款商用智能体(Claude Code, Cursor, Codex)。每个智能体在有无RIG上下文的情况下,针对每个仓库回答三十个结构化问题,并测量其准确率、实际完成时间及效率(每正确答案耗时)。跨仓库与智能体的实验表明,提供RIG使平均准确率提升12.2%,完成时间减少53.9%,每正确答案平均耗时降低57.8%。多语言仓库的提升更为显著:其准确率平均提高17.7%,效率提升69.5%,而单语言仓库的对应提升分别为6.6%与46.1%。定性分析表明,RIG将失败模式从结构误解转向基于正确结构的推理错误,而少数性能倒退案例则提示基于图谱的推理质量仍是关键影响因素。