Approximate Nearest Neighbor (ANN) search has become fundamental to modern AI infrastructure, powering recommendation systems, search engines, and large language models across industry leaders from Google to OpenAI. Hierarchical Navigable Small World (HNSW) graphs have emerged as the dominant ANN algorithm, widely adopted in production systems due to their superior recall versus latency balance. However, as vector databases scale to billions of embeddings, HNSW faces critical bottlenecks: memory consumption expands, distance computation overhead dominates query latency, and it suffers suboptimal performance on heterogeneous data distributions. This paper presents Adaptive Quantization and Rerank HNSW (AQR-HNSW), a novel framework that synergistically integrates three strategies to enhance HNSW scalability. AQR-HNSW introduces (1) density-aware adaptive quantization, achieving 4x compression while preserving distance relationships; (2) multi-state re-ranking that reduces unnecessary computations by 35%; and (3) quantization-optimized SIMD implementations delivering 16-64 operations per cycle across architectures. Evaluation on standard benchmarks demonstrates 2.5-3.3x higher queries per second (QPS) than state-of-the-art HNSW implementations while maintaining over 98% recall, with 75% memory reduction for the index graph and 5x faster index construction.
翻译:近似最近邻(ANN)搜索已成为现代人工智能基础设施的核心技术,为从谷歌到OpenAI等行业领先企业的推荐系统、搜索引擎和大语言模型提供支持。其中,分层可导航小世界(HNSW)图已成为主流的ANN算法,因其在召回率与延迟之间的优异平衡而被广泛应用于生产系统。然而,随着向量数据库扩展到数十亿嵌入向量,HNSW面临关键瓶颈:内存消耗急剧增长,距离计算开销主导查询延迟,且在异构数据分布上性能欠佳。本文提出自适应量化与重排序HNSW(AQR-HNSW),这是一个通过协同整合三种策略以增强HNSW可扩展性的新型框架。AQR-HNSW引入了(1)密度感知自适应量化,在保持距离关系的同时实现4倍压缩;(2)多状态重排序机制,减少35%的不必要计算;(3)量化优化的SIMD实现,在不同架构上每周期可执行16至64次操作。在标准基准测试上的评估表明,相较于最先进的HNSW实现,AQR-HNSW每秒查询数(QPS)提升2.5-3.3倍,同时保持超过98%的召回率,索引图内存减少75%,索引构建速度加快5倍。