GPU-accelerated Inverted File (IVF) index is one of the industry standards for large-scale vector search but relies on static VRAM layouts that hinder real-time mutability. Our benchmark and analysis reveal that existing designs of GPU IVF necessitate expensive CPU-GPU data transfers for index updates, causing system latency to spike from milliseconds to seconds in streaming scenarios. We present SIVF, a GPU-native index that enables high-velocity, in-place mutation via a series of new data structures and algorithms, such as conflict-free slab allocation and coalesced search on non-contiguous memory. SIVF has been implemented and integrated into the open-source vector search library, Faiss. Evaluation against baselines with diverse vector datasets demonstrates that SIVF reduces deletion latency by orders of magnitude compared to the state-of-the-arts. Furthermore, distributed experiments on a 12-GPU cluster demonstrate that SIVF exhibits near perfect linear scalability, achieving an aggregate ingestion throughput of 4.07 million vectors/s and a deletion throughput of 108.5 million vectors/s.
翻译:GPU加速的倒排文件(IVF)索引是大规模向量搜索的行业标准之一,但其依赖静态VRAM布局,阻碍了实时可变性。我们的基准测试与分析表明,现有GPU IVF设计方案在索引更新时需要昂贵的CPU-GPU数据传输,导致流式场景下系统延迟从毫秒级骤升至秒级。本文提出SIVF——一种原生GPU索引,通过一系列新型数据结构与算法(如无冲突分片分配与非连续内存上的合并搜索)实现高速原地突变。SIVF已被实现并集成至开源向量搜索库Faiss中。基于多样化向量数据集的基线评估表明,与当前最优方法相比,SIVF将删除延迟降低了数个数量级。此外,在12-GPU集群上的分布式实验显示,SIVF展现出近乎完美的线性可扩展性,可实现407万向量/秒的聚合摄取吞吐量与1.085亿向量/秒的删除吞吐量。