Prometheus: Towards Long-Horizon Codebase Navigation for Repository-Level Problem Solving

Large Language Models (LLMs) have shown remarkable capabilities in automating software engineering tasks, spurring the emergence of coding agents that scaffold LLMs with external tools to resolve repository-level problems. However, existing agents still struggle to navigate large-scale codebases, as the Needle-in-a-Haystack problem persists even with million-token context windows, where relevant evidence is often overwhelmed by large volumes of irrelevant code and documentation. Prior codebase navigation approaches, including embedding-based retrieval, file-system exploration, and graph-based retrieval, address parts of this challenge but fail to capture the temporal continuity of agent reasoning, rendering agents stateless and causing repeated repository traversals that hinder scalable planning and reasoning. To address these limitations, we present Prometheus, a memory-centric coding agent framework for long-horizon codebase navigation. Prometheus represents the repository as a unified knowledge graph to encode semantic dependencies and employs a context engine augmented with working memory that retains and reuses previously explored contexts to ensure continuity across reasoning steps. Built upon this engine, Prometheus integrates memory-enhanced navigation into a multi-agent system for automated issue resolution, encompassing issue classification, bug reproduction, patch generation, and verification. Comprehensive experiments are conducted on two widely used issue resolution benchmarks, i.e., SWE-bench Verified and SWE-PolyBench Verified. Powered by GPT-5, Prometheus achieves state-of-the-art performance with 74.4% and 33.8% resolution rates on the two benchmarks, ranking Top-6 and Top-1 among open-source agent systems, respectively. Our data and code are available at https://github.com/EuniAI/Prometheus.

翻译：大型语言模型（LLMs）在自动化软件工程任务方面展现出卓越能力，推动了编码智能体的兴起，这些智能体通过外部工具增强LLMs以解决仓库级问题。然而，现有智能体在导航大规模代码库时仍面临困难，因为即使具备百万级token的上下文窗口，“大海捞针”问题依然存在——相关证据常被大量无关代码和文档所淹没。先前的代码库导航方法（包括基于嵌入的检索、文件系统探索和基于图的检索）部分解决了这一挑战，但未能捕捉智能体推理的时间连续性，导致智能体处于无状态模式，引发重复的仓库遍历，从而阻碍了可扩展的规划与推理。为克服这些局限，我们提出普罗米修斯——一个面向长视野代码库导航的以记忆为中心的编码智能体框架。普罗米修斯将代码仓库表示为统一的知识图谱以编码语义依赖关系，并采用配备工作记忆的上下文引擎来保留和复用先前探索的上下文，确保推理步骤间的连续性。基于该引擎，普罗米修斯将记忆增强的导航功能集成到用于自动化问题解决的多智能体系统中，涵盖问题分类、错误复现、补丁生成与验证等环节。我们在两个广泛使用的问题解决基准测试（即SWE-bench Verified和SWE-PolyBench Verified）上进行了全面实验。在GPT-5驱动下，普罗米修斯在两个基准测试中分别实现了74.4%和33.8%的问题解决率，达到最先进性能，在开源智能体系统中分别位列Top-6和Top-1。我们的数据与代码已公开于https://github.com/EuniAI/Prometheus。