Late-interaction models like ColBERT offer a competitive performance across various retrieval tasks, but require storing a dense embedding for each document token, leading to a substantial index storage overhead. Past works address this by attempting to prune low-importance token embeddings based on statistical and empirical measures, but they often either lack formal grounding or are ineffective. To address these shortcomings, we introduce a framework grounded in hyperspace geometry and cast token pruning as a Voronoi cell estimation problem in the embedding space. By interpreting each token's influence as a measure of its Voronoi region, our approach enables principled pruning that retains retrieval quality while reducing index size. Through our experiments, we demonstrate that this approach serves not only as a competitive pruning strategy but also as a valuable tool for improving and interpreting token-level behavior within dense retrieval systems.
翻译:诸如ColBERT等延迟交互模型在各种检索任务中表现出色,但需要为每个文档令牌存储稠密嵌入,导致显著的索引存储开销。以往研究试图基于统计和实证度量剪枝低重要性令牌嵌入来解决此问题,但这些方法往往缺乏形式化基础或效果有限。为克服这些缺陷,我们引入一个基于超空间几何学的框架,将令牌剪枝问题转化为嵌入空间中的Voronoi单元估计问题。通过将每个令牌的影响力解释为其Voronoi区域度量,我们的方法实现了在保持检索质量的同时减小索引规模的原理性剪枝。实验表明,该方法不仅是具有竞争力的剪枝策略,还可作为改进和解释稠密检索系统中令牌级行为的重要工具。