Multi-vector dense retrieval methods like ColBERT systematically use a single-layer linear projection to reduce the dimensionality of individual vectors. In this study, we explore the implications of the MaxSim operator on the gradient flows of the training of multi-vector models and show that such a simple linear projection has inherent, if non-critical, limitations in this setting. We then discuss the theoretical improvements that could result from replacing this single-layer projection with well-studied alternative feedforward linear networks (FFN), such as deeper, non-linear FFN blocks, GLU blocks, and skip-connections, could alleviate these limitations. Through the design and systematic evaluation of alternate projection blocks, we show that better-designed final projections positively impact the downstream performance of ColBERT models. We highlight that many projection variants outperform the original linear projections, with the best-performing variants increasing average performance on a range of retrieval benchmarks across domains by over 2 NDCG@10 points. We then conduct further exploration on the individual parameters of these projections block in order to understand what drives this empirical performance, highlighting the particular importance of upscaled intermediate projections and residual connections. As part of these ablation studies, we show that numerous suboptimal projection variants still outperform the traditional single-layer projection across multiple benchmarks, confirming our hypothesis. Finally, we observe that this effect is consistent across random seeds, further confirming that replacing the linear layer of ColBERT models is a robust, drop-in upgrade.
翻译:多向量密集检索方法(如ColBERT)通常采用单层线性投影来降低单个向量的维度。本研究探讨了MaxSim算子对多向量模型训练中梯度流的影响,并证明在这种设置下,此类简单线性投影存在固有(虽非关键性)的局限性。我们继而从理论上分析了用经过充分研究的替代前馈线性网络(FFN)——例如更深的非线性FFN模块、GLU模块和跳跃连接——替换单层投影可能带来的改进,这些改进有望缓解上述局限性。通过设计并系统评估替代投影模块,我们证明了优化设计的最终投影能对ColBERT模型的下游性能产生积极影响。研究发现,多种投影变体均优于原始线性投影,其中性能最佳的变体在跨领域检索基准测试中将NDCG@10平均性能提升超过2个百分点。我们进一步探究了这些投影模块的个体参数,以理解驱动这种经验性能的内在机制,特别强调了放大中间投影和残差连接的重要性。在消融研究中,我们证明即使多个次优投影变体在多个基准测试中仍优于传统单层投影,从而验证了我们的假设。最后,我们观察到这种改进效果在不同随机种子下保持稳定,进一步证实了替换ColBERT模型的线性层是一种鲁棒且即插即用的升级方案。