Hierarchical Navigable Small World (HNSW) has demonstrated impressive accuracy and low latency for high-dimensional nearest neighbor searches. However, its high computational demands and irregular, large-volume data access patterns present significant challenges to search efficiency. To address these challenges, we introduce pHNSW, an algorithm-hardware co-optimized solution that accelerates HNSW through Principal Component Analysis (PCA) filtering. On the algorithm side, we apply PCA filtering to reduce the dimensionality of the dataset, thereby lowering the volume of neighbor access and decreasing the computational load for distance calculations. On the hardware side, we design the pHNSW processor with custom instructions to optimize search throughput and energy efficiency. In the experiments, we synthesized the pHNSW processor RTL design with a 65nm technology node and evaluated it using DDR4 and HBM1.0 DRAM standards. The results show that pHNSW boosts Queries per Second (QPS) by 14.47x-21.37x on a CPU and 5.37x-8.46x on a GPU, while reducing energy consumption by up to 57.4% compared to standard HNSW implementation.
翻译:层次可导航小世界(HNSW)算法在高维最近邻搜索中展现出卓越的准确性与低延迟特性。然而,其高计算复杂度及非规则的大规模数据访问模式对搜索效率构成了显著挑战。为应对这些挑战,本文提出pHNSW——一种通过主成分分析(PCA)过滤机制加速HNSW的算法-硬件协同优化方案。在算法层面,我们采用PCA过滤降低数据集维度,从而减少邻域访问量并降低距离计算的计算负载。在硬件层面,我们设计了支持定制指令的pHNSW处理器,以优化搜索吞吐量与能效。实验中,我们基于65纳米工艺节点合成了pHNSW处理器的RTL设计,并采用DDR4与HBM1.0 DRAM标准进行评估。结果表明:相较于标准HNSW实现,pHNSW在CPU上可实现14.47倍至21.37倍的每秒查询量(QPS)提升,在GPU上可实现5.37倍至8.46倍的QPS提升,同时能耗最高降低57.4%。