SPI: Query-Depth-Adaptive Indexing for Streaming RAG in Vector Databases

Vector databases (VecDBs) are increasingly deployed in retrieval-augmented generation (RAG) pipelines where query processing and document ingestion occur concurrently. The index layer needs to provide low-latency search while incorporating new vectors without frequent global rebuilding. Existing VecDB pipelines typically operate within a uniform representation regime, despite substantial variation in the semantic granularity required across queries. This motivates an index design that supports incremental updates while adapting retrieval depth to query distribution and complexity. We propose \textbf{Semantic Pyramid Indexing (SPI)}, a VecDB-layer indexing framework that organizes embeddings into $L$ semantically aligned resolution levels and selects retrieval depth per query via a lightweight uncertainty-aware controller. SPI supports progressive coarse-to-fine ANN search, level-wise streaming insertion without global rebuilds, and distributed execution through LSH partitioning with asynchronous gRPC coordination. Unlike hierarchical ANN structures with fixed traversal rules (e.g., SPANN), SPI adapts resolution at query time while remaining compatible with FAISS and Qdrant backends. On MS MARCO and Natural Questions, SPI achieves competitive Recall@10 with lower latency under the same dense encoder family, yielding a \textbf{1.4--2.3$\times$} average retrieval latency reduction under fixed Recall@10 targets relative to comparable approximate-ANN baselines. A prototype scaling study up to 8 nodes shows $6.2\times$ throughput scaling (${\approx}73\%$ efficiency); the 16-node configuration is included for completeness but shows diminishing efficiency. We provide a top-$K$ stability guarantee: queries with sufficient retrieval margin return an identical top-$K$ set at a shallower level. Code and configurations are available at https://github.com/FastLM/SPI_VecDB.

翻译：向量数据库(VecDB)越来越多地部署在检索增强生成(RAG)流水线中，其中查询处理和文档摄取同时进行。索引层需要在无需频繁全局重建的情况下，提供低延迟搜索并纳入新向量。现有VecDB流水线通常在统一表示框架下运行，但查询所需的语义粒度存在显著差异。这促使设计一种支持增量更新、同时根据查询分布和复杂度自适应调整检索深度的索引架构。我们提出**语义金字塔索引(Semantic Pyramid Indexing, SPI)**，这是一种VecDB层索引框架，将嵌入组织为$L$个语义对齐的分辨率层级，并通过轻量级不确定性感知控制器为每个查询选择检索深度。SPI支持渐进式粗到细的ANN搜索、无需全局重建的逐层流式插入，以及通过LSH分区与异步gRPC协调实现的分布式执行。与具有固定遍历规则的分层ANN结构（如SPANN）不同，SPI在查询时自适应分辨率，同时兼容FAISS和Qdrant后端。在MS MARCO和Natural Questions数据集上，在相同密集编码器家族下，SPI在达到竞争性Recall@10的同时延迟更低，相对可比近似ANN基线，在固定Recall@10目标下实现了**1.4-2.3倍**的平均检索延迟降低。8节点原型扩展研究显示吞吐量提升6.2倍（约73%效率）；16节点配置作为完整性展示，但效率下降。我们提供top-$K$稳定性保证：具有足够检索裕度的查询在较浅层级返回相同的top-$K$集合。代码和配置见https://github.com/FastLM/SPI_VecDB。