Cross-encoders are effective passage and document re-rankers but less efficient than other neural or classic retrieval models. A few previous studies have applied windowed self-attention to make cross-encoders more efficient. However, these studies did not investigate the potential and limits of different attention patterns or window sizes. We close this gap and systematically analyze how token interactions can be reduced without harming the re-ranking effectiveness. Experimenting with asymmetric attention and different window sizes, we find that the query tokens do not need to attend to the passage or document tokens for effective re-ranking and that very small window sizes suffice. In our experiments, even windows of 4 tokens still yield effectiveness on par with previous cross-encoders while reducing the memory requirements by at least 22% / 59% and being 1% / 43% faster at inference time for passages / documents.
翻译:交叉编码器在段落和文档重排序中表现出色,但其效率低于其他神经或经典检索模型。先前少数研究通过应用窗口自注意力机制提升了交叉编码器的效率,但这些研究并未深入探讨不同注意力模式或窗口大小的潜力与局限。为填补这一空白,我们系统分析了如何在保证重排序效果的前提下减少令牌交互。通过实验非对称注意力机制与不同窗口大小,我们发现查询令牌无需关注段落或文档令牌即可实现有效重排序,且极小窗口尺寸已足够。实验表明,即使窗口仅为4个令牌,其重排序效果仍与先前交叉编码器相当,同时将内存需求降低至少22%(段落)/59%(文档),推理速度提升1%(段落)/43%(文档)。