In numerous production environments, Approximate Nearest Neighbor Search (ANNS) plays an indispensable role, particularly when dealing with massive datasets that can contain billions of entries. The necessity for rapid response times in these applications makes the efficiency of ANNS algorithms crucial. However, traditional ANNS approaches encounter substantial challenges at the billion-scale level. CPU-based methods are hindered by the limitations of memory bandwidth, while GPU-based methods struggle with memory capacity and resource utilization efficiency. This paper introduces MemANNS, an innovative framework that utilizes UPMEM PIM architecture to address the memory bottlenecks in ANNS algorithms at scale. We concentrate on optimizing a well-known ANNS algorithm, IVFPQ, for PIM hardware through several techniques. First, we introduce an architecture-aware strategy for data placement and query scheduling that ensures an even distribution of workload across PIM chips, thereby maximizing the use of aggregated memory bandwidth. Additionally, we have developed an efficient thread scheduling mechanism that capitalizes on PIM's multi-threading capabilities and enhances memory management to boost cache efficiency. Moreover, we have recognized that real-world datasets often feature vectors with frequently co-occurring items. To address this, we propose a novel encoding method for IVFPQ that minimizes memory accesses during query processing. Our comprehensive evaluation using actual PIM hardware and real-world datasets at the billion-scale, show that MemANNS offers a significant 4.3x increase in QPS over CPU-based Faiss, and it matches the performance of GPU-based Faiss implementations. Additionally, MemANNS improves energy efficiency, with a 2.3x enhancement in QPS/Watt compared to GPU solutions.
翻译:在许多生产环境中,近似最近邻搜索(ANNS)发挥着不可或缺的作用,尤其是在处理包含数十亿条目的海量数据集时。此类应用对快速响应时间的需求使得ANNS算法的效率至关重要。然而,传统ANNS方法在十亿级规模上面临着重大挑战。基于CPU的方法受限于内存带宽瓶颈,而基于GPU的方法则受困于内存容量和资源利用效率问题。本文提出MemANNS,一种利用UPMEM PIM架构的创新框架,旨在解决大规模ANNS算法中的内存瓶颈问题。我们聚焦于通过多项技术优化知名ANNS算法IVFPQ,使其适配PIM硬件。首先,我们提出一种架构感知的数据布局与查询调度策略,确保工作负载在PIM芯片间均匀分布,从而最大化聚合内存带宽的利用率。此外,我们开发了一种高效的线程调度机制,该机制充分利用PIM的多线程能力,并通过增强内存管理来提升缓存效率。进一步地,我们注意到现实数据集中的向量常包含高频共现项。为此,我们为IVFPQ设计了一种新颖的编码方法,可显著减少查询过程中的内存访问次数。基于实际PIM硬件与十亿级真实数据集的综合评估表明,MemANNS相比基于CPU的Faiss实现了4.3倍的QPS提升,且性能与基于GPU的Faiss实现相当。此外,MemANNS能效表现优异,其QPS/瓦特指标较GPU方案提高了2.3倍。