Hybrid queries combining high-dimensional vector similarity search with spatio-temporal filters are increasingly critical for modern retrieval-augmented generation (RAG) systems. Existing systems typically handle these workloads by nesting vector indices within low-dimensional spatial structures, such as R-trees. However, this decoupled architecture fragments the vector space, forcing the query engine to invoke multiple disjoint sub-indices per query. This fragmentation destroys graph routing connectivity, incurs severe traversal overhead, and struggles to optimize for complex spatial boundaries. In this paper, we propose CubeGraph, a novel indexing framework designed to natively integrate vector search with arbitrary spatial constraints. CubeGraph partitions the spatial domain using a hierarchical grid, maintaining modular vector graphs within each cell. During query execution, CubeGraph dynamically stitches together adjacent cube-level indices on the fly whenever their spatial cells intersect with the query filter. This dynamic graph integration restores global connectivity, enabling a unified, single-pass nearest-neighbor traversal that eliminates the overhead of fragmented sub-index invocations. Extensive evaluations on real-world datasets demonstrate that CubeGraph significantly outperforms state-of-the-art baselines, offering superior query execution performance, scalability, and flexibility for complex hybrid workloads.
翻译:摘要:结合高维向量相似性搜索与时空滤波器的混合查询,在现代检索增强生成(RAG)系统中日益关键。现有系统通常通过将向量索引嵌套在低维空间结构(如R树)中来处理此类负载。然而,这种解耦架构会割裂向量空间,迫使查询引擎为每个查询调用多个不连续的索引分区。这种碎片化破坏了图路由的连通性,导致严重的遍历开销,且难以针对复杂空间边界进行优化。本文提出CubeGraph——一种旨在将向量搜索与任意空间约束原生融合的新型索引框架。CubeGraph采用分层网格划分空间域,在每个网格单元中维护模块化的向量图。在查询执行过程中,当相邻网格单元的空间区域与查询过滤器相交时,CubeGraph能够动态即席拼接这些立方体级索引。这种动态图集成恢复了全局连通性,实现了统一的单次最近邻遍历,消除了碎片化子索引调用带来的开销。在真实数据集上的大量评估表明,CubeGraph显著优于现有最优基线方法,在复杂混合查询负载下展现出卓越的查询执行性能、可扩展性与灵活性。