SpIDER: Spatially Informed Dense Embedding Retrieval for Software Issue Localization

Retrieving code functions, classes or files that are relevant in order to solve a given user query, bug report or feature request from large codebases is a fundamental challenge for Large Language Model (LLM)-based coding agents. Agentic approaches typically employ sparse retrieval methods like BM25 or dense embedding strategies to identify semantically relevant units. While embedding-based approaches can outperform BM25 by large margins, they often don't take into consideration the underlying graph-structured characteristics of the codebase. To address this, we propose SpIDER (Spatially Informed Dense Embedding Retrieval), an enhanced dense retrieval approach that integrates LLM-based reasoning along with auxiliary information obtained from graph-based exploration of the codebase. We further introduce SpIDER-Bench, a graph-structured evaluation benchmark curated from SWE-PolyBench, SWEBench-Verified and Multi-SWEBench, spanning codebases from Python, Java, JavaScript and TypeScript programming languages. Empirical results show that SpIDER consistently improves dense retrieval performance by at least 13% across programming languages and benchmarks in SpIDER-Bench.

翻译：从大型代码库中检索与解决给定用户查询、错误报告或功能请求相关的代码函数、类或文件，是基于大型语言模型（LLM）的编码智能体面临的一项基础性挑战。智能体方法通常采用稀疏检索方法（如BM25）或密集嵌入策略来识别语义相关的代码单元。虽然基于嵌入的方法在性能上可大幅超越BM25，但它们往往未考虑代码库底层图结构化的特性。为此，我们提出SpIDER（空间感知密集嵌入检索），这是一种增强的密集检索方法，它整合了基于LLM的推理以及通过对代码库进行图探索获得的辅助信息。我们进一步引入了SpIDER-Bench，这是一个从SWE-PolyBench、SWEBench-Verified和Multi-SWEBench中精心构建的图结构评估基准，涵盖Python、Java、JavaScript和TypeScript编程语言的代码库。实证结果表明，在SpIDER-Bench中，SpIDER在不同编程语言和基准测试上，持续将密集检索性能提升了至少13%。

相关内容

网络爬虫

关注 13

网络爬虫（又被称为网页蜘蛛，网络机器人，在FOAF社区中间，更经常被称为网页追逐者），是一种按照一定的规则，自动的抓取万维网信息的程序或者脚本，已被广泛应用于互联网领域。搜索引擎使用网络爬虫抓取Web网页、文档甚至图片、音频、视频等资源，通过相应的索引技术组织这些信息，提供给搜索用户进行查询。网络爬虫也为中小站点的推广提供了有效的途径。

【AAAI2026】AutoTool：面向大语言模型智能体的高效工具选择方法

专知会员服务

19+阅读 · 2025年11月19日

《面向空军的知识图谱即解决方案：领域知识有效融入大语言模型》

专知会员服务

56+阅读 · 2025年11月8日

基于大语言模型的深度搜索智能体综述：范式、优化、评测与挑战

专知会员服务

34+阅读 · 2025年8月11日

AgentOps综述：分类、挑战与未来方向

专知会员服务

40+阅读 · 2025年8月6日