Late-interaction models like ColBERT offer a competitive performance across various retrieval tasks, but require storing a dense embedding for each document token, leading to a substantial index storage overhead. Past works address this by attempting to prune low-importance token embeddings based on statistical and empirical measures, but they often either lack formal grounding or are ineffective. To address these shortcomings, we introduce a framework grounded in hyperspace geometry and cast token pruning as a Voronoi cell estimation problem in the embedding space. By interpreting each token's influence as a measure of its Voronoi region, our approach enables principled pruning that retains retrieval quality while reducing index size. Through our experiments, we demonstrate that this approach serves not only as a competitive pruning strategy but also as a valuable tool for improving and interpreting token-level behavior within dense retrieval systems.
翻译:诸如ColBERT之类的延迟交互模型在各种检索任务中展现出卓越性能,但需要为每个文档令牌存储稠密嵌入,导致显著的索引存储开销。以往研究试图基于统计和实证度量来剪枝低重要性令牌嵌入,但这些方法往往缺乏形式化基础或效果不佳。为克服这些不足,我们引入了一个基于超空间几何的框架,并将令牌剪枝问题转化为嵌入空间中的Voronoi单元估计问题。通过将每个令牌的影响力解释为其Voronoi区域的度量,我们的方法实现了在保持检索质量的同时减小索引规模的原理性剪枝。实验表明,该方法不仅是一种具有竞争力的剪枝策略,还可作为改进和解释稠密检索系统中令牌级行为的有效工具。