Retrieving code functions, classes or files that are relevant in order to solve a given user query, bug report or feature request from large codebases is a fundamental challenge for Large Language Model (LLM)-based coding agents. Agentic approaches typically employ sparse retrieval methods like BM25 or dense embedding strategies to identify semantically relevant units. While embedding-based approaches can outperform BM25 by large margins, they often don't take into consideration the underlying graph-structured characteristics of the codebase. To address this, we propose SpIDER (Spatially Informed Dense Embedding Retrieval), an enhanced dense retrieval approach that integrates LLM-based reasoning along with auxiliary information obtained from graph-based exploration of the codebase. We further introduce SpIDER-Bench, a graph-structured evaluation benchmark curated from SWE-PolyBench, SWEBench-Verified and Multi-SWEBench, spanning codebases from Python, Java, JavaScript and TypeScript programming languages. Empirical results show that SpIDER consistently improves dense retrieval performance by at least 13% across programming languages and benchmarks in SpIDER-Bench.
翻译:从大型代码库中检索与解决给定用户查询、错误报告或功能请求相关的代码函数、类或文件,是基于大型语言模型(LLM)的编码智能体面临的一项基础性挑战。智能体方法通常采用稀疏检索方法(如BM25)或密集嵌入策略来识别语义相关的代码单元。虽然基于嵌入的方法在性能上可大幅超越BM25,但它们往往未考虑代码库底层图结构化的特性。为此,我们提出SpIDER(空间感知密集嵌入检索),这是一种增强的密集检索方法,它整合了基于LLM的推理以及通过对代码库进行图探索获得的辅助信息。我们进一步引入了SpIDER-Bench,这是一个从SWE-PolyBench、SWEBench-Verified和Multi-SWEBench中精心构建的图结构评估基准,涵盖Python、Java、JavaScript和TypeScript编程语言的代码库。实证结果表明,在SpIDER-Bench中,SpIDER在不同编程语言和基准测试上,持续将密集检索性能提升了至少13%。