Multivector retrieval models achieve state-of-the-art effectiveness through fine-grained token-level representations, but their deployment incurs substantial computational and memory costs. Current solutions, based on the well-known k-means clustering algorithm, group similar vectors together to enable both effective compression and efficient retrieval. However, standard k-means scales poorly with the number of clusters and dataset size, and favours frequent tokens during training while underrepresenting rare, discriminative ones. In this work, we introduce TACHIOM, a multivector retrieval system that exploits token-level structure to significantly accelerate both clustering and retrieval. By accounting for tokens' distribution during centroid allocation, TACHIOM easily scales to millions of centroids, enabling highly accurate document scoring using only centroids, avoiding expensive token-level computation. TACHIOM combines a graph-based index over centroids with an optimized Product Quantization layout for efficient final scoring. Experiments on MS-MARCOv1 and LoTTE show that TACHIOM achieves up to $247\times$ faster clustering than k-means and up to $9.8\times$ retrieval speedup over state-of-the-art systems while maintaining comparable or superior effectiveness.
翻译:多向量检索模型通过细粒度的词元级表示实现了最先进的有效性,但其部署会带来显著的计算和内存成本。当前基于著名k-means聚类算法的解决方案将相似向量分组,以实现高效压缩与快速检索。然而,标准k-means算法在聚类数目和数据集规模扩展性方面表现欠佳,且在训练过程中偏向频繁出现的词元,同时低估了稀有但具有区分性的词元。本文提出TACHIOM多向量检索系统,该系统利用词元级结构显著加速聚类与检索过程。通过考虑中心点分配时的词元分布,TACHIOM可轻松扩展至数百万个中心点,仅使用中心点即可实现高精度文档评分,避免昂贵的词元级计算。TACHIOM将基于图的中心点索引与优化的乘积量化布局相结合,以实现高效的最终评分。在MS-MARCOv1和LoTTE上的实验表明,TACHIOM的聚类速度比k-means快247倍,检索速度比最先进的系统快9.8倍,同时保持相当或更优的有效性。