Product Quantization (PQ) construction is deeply integrated into vector index construction for Approximate Nearest Neighbor Search (ANNS). The rapid growth in vector dimensionality and volume has significantly increased the computational cost of PQ. Existing GPU-based PQ accelerations are ill-suited for PQ construction due to its "one-to-one" execution pattern (one compute, one data load, i.e., data transfer overhead dominates). Although CPU-based solutions are prevalent, they are essentially general-purpose designs that fail to capture the intrinsic characteristics of PQ construction.In this paper, we propose CS-PQ, a Cache-friendly, SIMD-optimized PQ framework based on modern CPUs. CS-PQ introduces a vector-oriented SIMD paradigm that decouples quantization granularity from SIMD width by vectorizing across PQ centroids rather than subvector dimensions. It further restructures the execution pipeline to improve cache locality and reformulates PQ computation to eliminate redundant operations while preserving correctness. Experiments on large-scale datasets show that CS-PQ achieves up to 10.7 times speedup over state-of-the-art CPU-based PQ implementations without sacrificing ANNS accuracy.
翻译:乘积量化(PQ)的构建过程深度集成于近似最近邻搜索(ANNS)的向量索引构建中。随着向量维度与数据规模的急剧增长,PQ的计算成本显著增加。现有基于GPU的PQ加速方法因其"一对一"执行模式(即单次计算对应单次数据加载,数据传输开销占主导)而不适用于PQ构建。尽管基于CPU的解决方案较为普遍,但这些本质上属于通用设计,未能捕捉PQ构建的内在特征。本文提出CS-PQ——一种基于现代CPU的缓存友好型SIMD优化PQ框架。CS-PQ引入面向向量的SIMD计算范式,通过跨PQ质心而非子向量维度进行向量化,从而将量化粒度与SIMD宽度解耦。该方法进一步重构执行流水线以提升缓存局部性,并重新设计PQ计算流程,在保证正确性的同时消除冗余操作。在大规模数据集上的实验表明,CS-PQ在不牺牲ANNS精度的前提下,相较于现有最优的CPU端PQ实现可取得最高10.7倍的加速比。