Efficient vector search is essential for powering large-scale AI applications, such as LLMs. Existing solutions are designed for monolithic architectures where compute and memory are tightly coupled. Recently, disaggregated architecture breaks this coupling by separating compution and memory resources into independently scalable pools to improve utilization. However, applying vector database on disaggregated memory system brings unique challenges to system design due to its graph-based index. We present d-HNSW, the first RDMA-based vector search engine optimized for disaggregated memory systems. d-HNSW preserves HNSW's high accuracy while addressing the new system-level challenges introduced by disaggregation: 1) network inefficiency from pointer-chasing traversals, 2) non-contiguous remote memory layout induced by dynamic insertions, 3) redundant data transfers in batch workloads, and 4) resource underutilization due to sequential execution. d-HNSW tackles these challenges through a set of hardware-algorithm co-designed techniques, including 1) balanced clustering with a lightweight representative index to reduce network round-trips and ensure predictable latency, 2) an RDMA-friendly graph layout that preserves data contiguity under dynamic insertions, 3) query-aware data loading to eliminate redundant fetches across batch queries, and 4) a pipelined execution model that overlaps RDMA transfers with computation to hide network latency and improve throughput. Our evaluation results in a public cloud show that d-HNSW achieves up to < 10-2x query latency and > 100x query throughput compared to other baselines, while maintaining a high recall of 94%.
翻译:高效向量搜索对于支撑大规模人工智能应用(如大语言模型)至关重要。现有解决方案专为计算与内存紧耦合的单体架构设计。近年来,解耦架构通过将计算与内存资源分离为可独立扩展的资源池,打破了这种耦合,从而提升了资源利用率。然而,由于向量数据库基于图的索引结构,将其应用于解耦内存系统给系统设计带来了独特的挑战。本文提出了d-HNSW,这是首个针对解耦内存系统优化的、基于RDMA的向量搜索引擎。d-HNSW在保持HNSW高精度的同时,解决了由解耦架构引入的新系统级挑战:1)由指针追逐遍历导致的网络低效,2)动态插入引发的非连续远程内存布局,3)批量工作负载中的冗余数据传输,以及4)顺序执行导致的资源利用不足。d-HNSW通过一系列硬件-算法协同设计技术应对这些挑战,包括:1)采用轻量级代表性索引的平衡聚类,以减少网络往返次数并确保可预测的延迟;2)一种RDMA友好的图布局,可在动态插入下保持数据连续性;3)查询感知的数据加载,以消除批量查询间的冗余数据获取;4)一种流水线执行模型,将RDMA传输与计算重叠,以隐藏网络延迟并提升吞吐量。我们在公有云上的评估结果表明,与其他基线方法相比,d-HNSW实现了高达< 10-2倍的查询延迟和> 100倍的查询吞吐量,同时保持了94%的高召回率。