On-disk graph-based vector search (GVS) has become the dominant approach for serving large-scale vector databases at high recall, but prior systems struggle to sustain concurrent search and update throughput on high-dimensional workloads. We find the main cause of this in position seeking, a full graph traversal that every update performs to locate neighbors before linking the new vector into the graph. Position seeking is fundamentally heavier than a search query, and its cost is further amplified by two systemic limitations of current GVS systems, packed layouts that couple every edge fetch to a full vector load, and a static entrance graph whose entry points drift away from newly inserted regions as updates accumulate. We present NAVIS, an on-SSD GVS system that drives down position-seeking overhead through (i) a layout-supported selective vector read that breaks the packed-page coupling without losing its locality benefits, (ii) a dynamic lightweight entrance graph update mechanism that reuses traversal information already produced by concurrent updates, and (iii) an entrance graph-aware edgelist cache that concentrates capacity on high-reuse paths near refreshed entry points. Across multiple large-scale high-dimensional benchmarks, NAVIS enhances average insertion throughput by up to 2.74x and average concurrent search throughput by up to 1.37x while reducing average search latency by up to 25.26%.
翻译:基于磁盘的图向量搜索(GVS)已成为高召回率下服务大规模向量数据库的主流方法,但现有系统在高维工作负载中难以维持并发搜索与更新的吞吐量。我们发现问题根源在于位置查找——每个更新操作在执行新向量链接到图之前,都需要通过完整图遍历定位邻居节点。位置查找本质上比搜索查询更繁重,其成本被当前GVS系统的两个系统性限制进一步放大:一是将每次边提取与完整向量加载耦合的紧凑存储布局,二是静态入口图导致入口点随更新累积而偏离新插入区域。我们提出NAVIS,一种通过以下技术降低位置查找开销的SSD端GVS系统:(i) 布局支持的向量选择性读取,打破紧凑页耦合而不损失局部性优势;(ii) 动态轻量级入口图更新机制,复用并发更新已产生的遍历信息;(iii) 入口图感知的边列表缓存,将容量集中在靠近刷新入口点的高复用路径上。在多个大规模高维度基准测试中,NAVIS将平均插入吞吐量提升至多2.74倍,平均并发搜索吞吐量提升至多1.37倍,同时将平均搜索延迟降低至多25.26%。