Approximate nearest neighbor search (ANNS) plays an indispensable role in a wide variety of applications, including recommendation systems, information retrieval, and semantic search. Among the cutting-edge ANNS algorithms, graph-based approaches provide superior accuracy and scalability on massive datasets. However, the best-performing graph-based ANN search solutions incur tens of hundreds of memory footprints as well as costly distance computation, thus hindering their efficient deployment at scale. The 3D NAND flash is emerging as a promising device for data-intensive applications due to its high density and nonvolatility. In this work, we present the near-storage processing (NSP)-based ANNS solution Proxima, to accelerate graph-based ANNS with algorithm-hardware co-design in 3D NAND flash. Proxima significantly reduces the complexity of graph search by leveraging the distance approximation and early termination. On top of the algorithmic enhancement, we implement Proxima search algorithm in 3D NAND flash using the heterogeneous integration technique. To maximize 3D NAND's bandwidth utilization, we present customized dataflow and optimized data allocation scheme. Our evaluation results show that: compared to graph ANNS on CPU and GPU, Proxima achieves a magnitude improvement in throughput or energy efficiency. Proxima yields 7x to 13x speedup over existing ASIC designs. Furthermore, Proxima achieves a good balance between accuracy, efficiency and storage density compared to previous NSP-based accelerators.
翻译:近似最近邻搜索(ANNS)在推荐系统、信息检索和语义搜索等多种应用中发挥着不可或缺的作用。在先进的ANNS算法中,基于图的方法在大型数据集上提供了优越的准确性和可扩展性。然而,性能最佳的基于图的ANN搜索解决方案需要数十到数百兆字节的内存占用以及昂贵的距离计算,从而阻碍了其大规模高效部署。三维NAND闪存因其高密度和非易失性,正成为数据密集型应用中极具前景的器件。本文提出了一种基于近存储处理(NSP)的ANNS解决方案Proxima,通过在三维NAND闪存中进行算法-硬件协同设计来加速基于图的ANNS。Proxima利用距离近似和提前终止技术显著降低了图搜索的复杂度。在算法优化的基础上,我们采用异构集成技术在三维NAND闪存中实现了Proxima搜索算法。为了最大化三维NAND的带宽利用率,我们提出了定制化数据流和优化的数据分配方案。评估结果表明:与基于CPU和GPU的图ANNS相比,Proxima在吞吐量或能效上实现了数量级的提升;与现有ASIC设计相比,Proxima实现了7倍至13倍的加速;此外,与先前基于NSP的加速器相比,Proxima在准确性、效率和存储密度之间取得了良好平衡。