We present GateANN, an I/O-efficient SSD-based graph ANNS system that supports filtered vector search on an unmodified graph index. Existing SSD-based systems either waste I/O by post-filtering, or require expensive filter-aware index rebuilds. GateANN avoids both by decoupling graph traversal from vector retrieval. Our key insight is that traversing a node requires only its neighbor list and an approximate distance, neither of which needs the full-precision vector on SSD. Based on this, GateANN introduces graph tunneling. It checks each node's filter predicate in memory before issuing I/O and routes through non-matching nodes entirely in memory, preserving graph connectivity without any SSD read for non-matching nodes. Our experimental results show that it reduces SSD reads by up to 10x and improves throughput by up to 7.6x.
翻译:我们提出GateANN,一种基于SSD的高效I/O图神经网络搜索系统,支持在未经修改的图索引上进行过滤向量搜索。现有基于SSD的系统要么通过后过滤浪费I/O,要么需要昂贵的过滤感知索引重建。GateANN通过将图遍历与向量检索解耦来避免这两种问题。我们的关键洞察在于:遍历节点仅需其邻居列表和近似距离,这两者都不需要SSD上的全精度向量。基于此,GateANN引入图隧道技术。它在发起I/O前在内存中检查每个节点的过滤谓词,并完全在内存中对不匹配节点进行路由,从而在不对不匹配节点进行任何SSD读取的情况下保持图连通性。实验结果表明,该系统可将SSD读取量减少高达10倍,吞吐量提升高达7.6倍。