RedNote (a.k.a., Xiaohongshu, a global-scale social network platform) widely adopts approximate nearest neighbor search (ANNS) to power its search, recommendation, and advertising services. Due to the demanding Service Level Agreements (SLAs), we have to rely on in-memory graph-based ANNS (i.e., HNSW) to provide high throughput and low latency. However, the ever-growing user base and content volume have led to an explosive increase in memory footprint and consequently huge CapEx and OpEx. After exploring various alternatives, we find that building a clustering-based ANNS on top of all-flash servers can be promising. Yet, we still experience severe overheads from the kernel I/O stack, a fixed pruning strategy, and slow index construction. We present HELMSMAN, a high-performance and cost-effective clustering-based ANNS system, which combines an ANNS-oriented userspace storage stack, a leveling-learned pruning module, and GPU-accelerated pipelines of construction. HELMSMAN saves over 90% of hardware costs and enables billion-scale index (re)builds within hours. In the current production deployment, operating stably for several months, 40 machines now host ANNS workloads that previously required about 35,000 cores and 0.35 PB DRAM.
翻译:RedNote(即小红书,一个全球规模的社交网络平台)广泛采用近似最近邻搜索(ANNS)来支撑其搜索、推荐及广告业务。由于严格的服务水平协议(SLA),我们不得不依赖基于内存的图索引ANNS(即HNSW),以满足高吞吐量和低延迟的需求。然而,持续增长的用户基数与内容规模导致内存占用急剧膨胀,从而带来巨大的资本与运营支出。在探索多种替代方案后,我们发现基于全闪存服务器构建聚类式ANNS颇具潜力。尽管如此,内核I/O栈的沉重开销、固定的剪枝策略以及缓慢的索引构建仍带来严峻挑战。为此,我们提出HELMSMAN——一个高性能、高性价比的聚类式ANNS系统,它融合了面向ANNS的用户态存储栈、渐进式学习剪枝模块以及GPU加速的流水线构建流程。HELMSMAN节省了超过90%的硬件成本,并能在数小时内完成十亿级索引的(重)构建。在当前生产环境中已稳定运行数月,仅需40台机器即可承载此前需约35,000个处理核心与0.35 PB内存的ANNS工作负载。