While Late Interaction models exhibit strong retrieval performance, many of their underlying dynamics remain understudied, potentially hiding performance bottlenecks. In this work, we focus on two topics in Late Interaction retrieval: a length bias that arises when using multi-vector scoring, and the similarity distribution beyond the best scores pooled by the MaxSim operator. We analyze these behaviors for state-of-the-art models on the NanoBEIR benchmark. Results show that while the theoretical length bias of causal Late Interaction models holds in practice, bi-directional models can also suffer from it in extreme cases. We also note that no significant similarity trend lies beyond the top-1 document token, validating that the MaxSim operator efficiently exploits the token-level similarity scores.
翻译:尽管晚期交互模型展现出强大的检索性能,但其许多底层动力学机制仍待深入研究,这可能隐藏着性能瓶颈。本研究聚焦于晚期交互检索中的两个主题:使用多向量评分时产生的长度偏差,以及MaxSim运算聚合的最佳分数之外的相似度分布。我们针对NanoBEIR基准上的最新模型分析这些行为。结果表明,虽然因果型晚期交互模型的理论长度偏差在实践中成立,但双向模型在极端情况下也可能受此影响。我们还注意到,在文档首标记之后不存在显著的趋势性相似特征,验证了MaxSim运算能有效利用标记级相似度分数。