MemANNS: Enhancing Billion-Scale ANNS Efficiency with Practical PIM Hardware

In numerous production environments, Approximate Nearest Neighbor Search (ANNS) plays an indispensable role, particularly when dealing with massive datasets that can contain billions of entries. The necessity for rapid response times in these applications makes the efficiency of ANNS algorithms crucial. However, traditional ANNS approaches encounter substantial challenges at the billion-scale level. CPU-based methods are hindered by the limitations of memory bandwidth, while GPU-based methods struggle with memory capacity and resource utilization efficiency. This paper introduces MemANNS, an innovative framework that utilizes UPMEM PIM architecture to address the memory bottlenecks in ANNS algorithms at scale. We concentrate on optimizing a well-known ANNS algorithm, IVFPQ, for PIM hardware through several techniques. First, we introduce an architecture-aware strategy for data placement and query scheduling that ensures an even distribution of workload across PIM chips, thereby maximizing the use of aggregated memory bandwidth. Additionally, we have developed an efficient thread scheduling mechanism that capitalizes on PIM's multi-threading capabilities and enhances memory management to boost cache efficiency. Moreover, we have recognized that real-world datasets often feature vectors with frequently co-occurring items. To address this, we propose a novel encoding method for IVFPQ that minimizes memory accesses during query processing. Our comprehensive evaluation using actual PIM hardware and real-world datasets at the billion-scale, show that MemANNS offers a significant 4.3x increase in QPS over CPU-based Faiss, and it matches the performance of GPU-based Faiss implementations. Additionally, MemANNS improves energy efficiency, with a 2.3x enhancement in QPS/Watt compared to GPU solutions.

翻译：在许多生产环境中，近似最近邻搜索（ANNS）发挥着不可或缺的作用，尤其是在处理包含数十亿条目的海量数据集时。此类应用对快速响应时间的需求使得ANNS算法的效率至关重要。然而，传统ANNS方法在十亿级规模上面临着重大挑战。基于CPU的方法受限于内存带宽瓶颈，而基于GPU的方法则受困于内存容量和资源利用效率问题。本文提出MemANNS，一种利用UPMEM PIM架构的创新框架，旨在解决大规模ANNS算法中的内存瓶颈问题。我们聚焦于通过多项技术优化知名ANNS算法IVFPQ，使其适配PIM硬件。首先，我们提出一种架构感知的数据布局与查询调度策略，确保工作负载在PIM芯片间均匀分布，从而最大化聚合内存带宽的利用率。此外，我们开发了一种高效的线程调度机制，该机制充分利用PIM的多线程能力，并通过增强内存管理来提升缓存效率。进一步地，我们注意到现实数据集中的向量常包含高频共现项。为此，我们为IVFPQ设计了一种新颖的编码方法，可显著减少查询过程中的内存访问次数。基于实际PIM硬件与十亿级真实数据集的综合评估表明，MemANNS相比基于CPU的Faiss实现了4.3倍的QPS提升，且性能与基于GPU的Faiss实现相当。此外，MemANNS能效表现优异，其QPS/瓦特指标较GPU方案提高了2.3倍。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日