Hybrid queries combining high-dimensional vector similarity search with spatio-temporal filters are increasingly critical for modern retrieval-augmented generation (RAG) systems. Existing systems typically handle these workloads by nesting vector indices within low-dimensional spatial structures, such as R-trees. However, this decoupled architecture fragments the vector space, forcing the query engine to invoke multiple disjoint sub-indices per query. This fragmentation destroys graph routing connectivity, incurs severe traversal overhead, and struggles to optimize for complex spatial boundaries. In this paper, we propose CubeGraph, a novel indexing framework designed to natively integrate vector search with arbitrary spatial constraints. CubeGraph partitions the spatial domain using a hierarchical grid, maintaining modular vector graphs within each cell. During query execution, CubeGraph dynamically stitches together adjacent cube-level indices on the fly whenever their spatial cells intersect with the query filter. This dynamic graph integration restores global connectivity, enabling a unified, single-pass nearest-neighbor traversal that eliminates the overhead of fragmented sub-index invocations. Extensive evaluations on real-world datasets demonstrate that CubeGraph significantly outperforms state-of-the-art baselines, offering superior query execution performance, scalability, and flexibility for complex hybrid workloads.
翻译:结合高维向量相似性搜索与时空过滤器的混合查询,对于现代检索增强生成系统愈发关键。现有系统通常通过将向量索引嵌套在低维空间结构(如R树)中处理此类工作负载。然而,这种解耦架构割裂了向量空间,迫使查询引擎每次查询需调用多个互不关联的子索引。这种碎片化破坏了图路由连通性,引发严重的遍历开销,且难以针对复杂空间边界进行优化。本文提出CubeGraph,一种将向量搜索与任意空间约束原生集成的新型索引框架。CubeGraph利用层次化网格划分空间域,在每个单元格内维护模块化向量图。查询执行时,CubeGraph会在邻近层级索引的空间单元格与查询过滤器相交时,即时动态拼接这些索引。这种动态图整合恢复了全局连通性,实现统一单次遍历的最近邻搜索,消除了碎片化子索引调用带来的开销。在真实数据集上的大量实验表明,CubeGraph显著优于现有最优基线方法,为复杂混合工作负载提供更优的查询性能、可扩展性与灵活性。