Vector database search with frequent updates is increasingly critical in applications such as retrieval augmented generation, recommendation systems, and large-scale embedding retrieval. Existing solutions, such as graph-based and partition-based approximate nearest neighbor search (ANNS), suffer from frequent index rebuilding due to data distribution-dependent indexing that impacts continuous deployment and causes long rebuilding latency. This paper proposes an algorithm-hardware co-designed platform, ACRONYM, that addresses key problems with state of the art database search. Algorithmically, it leverages efficient encoding independent of data distribution and Hamming-distance based search for efficient hardware acceleration. Architecturally, we propose CAM-based in-memory parallel distance computation followed by time multiplexed approximated top-k selection to enable the exhaustive search. We propose two-stage search that includes coarse search followed by binary refinement to achieve high recall in CAM based search which is heavily limited to small vector dimension due to capacity and wordline parasitic. ACRONYM supports continuous update without stalling and integrates novel XOR-and-Accumulate (XAC) based systolic-array encoder for efficient on chip encoding during search. Across million-scale datasets, while serving dynamic database ACRONYM achieves >90% recall at a throughput of 8e6 queries per second, with a memory footprint of only 32MB and an average energy consumption of 2.56uJ per query, speedup over HNSW (CPU) of about 400x and FAISS-IVF (GPU) of about 80x.
翻译:频繁更新的向量数据库搜索在检索增强生成、推荐系统和大规模嵌入检索等应用中日益关键。现有解决方案(如基于图和基于分区的近似最近邻搜索)因依赖数据分布的索引机制导致持续部署困难、重建延迟长,常需频繁重建索引。本文提出一种算法-硬件协同设计平台ACRONYM,解决了当前最先进数据库搜索中的关键问题。在算法层面,它利用独立于数据分布的高效编码和基于汉明距离的搜索实现高效硬件加速。在架构层面,我们提出基于内容可寻址存储器的内存并行距离计算,随后通过时分复用近似top-k选择实现穷举搜索。我们提出包含粗搜索和二元精化的两阶段搜索策略,以在受容量和字线寄生效应严重限制于小向量维度的基于CAM的搜索中实现高召回率。ACRONYM支持无阻塞连续更新,并集成基于异或累加(XAC)的脉动阵列编码器,在搜索过程中实现高效片内编码。在百万级数据集上,服务于动态数据库时,ACRONYM在8e6查询/秒的吞吐量下实现>90%召回率,内存占用仅32MB,平均每查询能耗2.56μJ,比HNSW(CPU)加速约400倍,比FAISS-IVF(GPU)加速约80倍。