Cross-encoders are effective passage and document re-rankers but less efficient than other neural or classic retrieval models. A few previous studies have applied windowed self-attention to make cross-encoders more efficient. However, these studies did not investigate the potential and limits of different attention patterns or window sizes. We close this gap and systematically analyze how token interactions can be reduced without harming the re-ranking effectiveness. Experimenting with asymmetric attention and different window sizes, we find that the query tokens do not need to attend to the passage or document tokens for effective re-ranking and that very small window sizes suffice. In our experiments, even windows of 4 tokens still yield effectiveness on par with previous cross-encoders while reducing the memory requirements to at most 78% / 41% and being 1% / 43% faster at inference time for passages / documents.
翻译:交叉编码器在段落和文档重排序中表现有效,但其效率低于其他神经或经典检索模型。此前少数研究采用窗口化自注意力机制提升交叉编码器效率,但未深入探究不同注意力模式或窗口大小的潜力与局限。本研究填补这一空白,系统分析如何在不损害重排序效果的前提下减少词元交互。通过非对称注意力与不同窗口尺寸的实验,我们发现查询词元无需关注段落或文档词元即可实现有效重排序,且极小窗口尺寸已足够。实验中,即使窗口大小为4个词元,重排序效果仍能与传统交叉编码器持平,同时将内存需求降至至多78%/41%,并在推理阶段对段落/文档分别实现1%/43%的加速。