GPU-accelerated Inverted File (IVF) index is one of the industry standards for large-scale vector analytics but relies on static VRAM layouts that hinder real-time mutability. Our benchmark and analysis reveal that existing designs of GPU IVF necessitate expensive CPU-GPU data transfers for index updates, causing system latency to spike from milliseconds to seconds in streaming scenarios. We present SIVF, a GPU-native index that enables high-velocity, in-place mutation via a series of new data structures and algorithms, such as conflict-free slab allocation and coalesced search on non-contiguous memory. SIVF has been implemented and integrated into the open-source vector search library, Faiss. Evaluation against baselines with diverse vector datasets demonstrates that SIVF reduces deletion latency by orders of magnitude compared to the baseline. Furthermore, distributed experiments on a 12-GPU cluster reveal that SIVF exhibits near perfect linear scalability, achieving an aggregate ingestion throughput of 4.07 million vectors/s and a deletion throughput of 108.5 million vectors/s.
翻译:GPU加速的倒排文件(IVF)索引是大规模向量分析的行业标准之一,但其依赖于静态的显存布局,阻碍了实时可变性。我们的基准测试与分析表明,现有GPU IVF设计在索引更新时需要昂贵的CPU-GPU数据传输,导致流式场景下系统延迟从毫秒级激增至秒级。本文提出SIVF,一种通过一系列新型数据结构与算法(如无冲突板分配和非连续内存上的合并搜索)实现高速原位变异的GPU原生索引。SIVF已实现并集成至开源向量搜索库Faiss。基于多样化向量数据集的基线评估表明,SIVF相较基线将删除延迟降低了数个数量级。此外,在12个GPU集群上的分布式实验显示,SIVF展现出近乎完美的线性可扩展性,实现了407万向量/秒的聚合写入吞吐量与1.085亿向量/秒的删除吞吐量。