RankMamba：Transformer时代下Mamba文档排序性能基准测试 (RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era of Transformers)

Transformer structure has achieved great success in multiple applied machine learning communities, such as natural language processing (NLP), computer vision (CV) and information retrieval (IR). Transformer architecture's core mechanism\, -- \,attention requires $O(n^2)$ time complexity in training and $O(n)$ time complexity in inference. Many works have been proposed to improve the attention mechanism's scalability, such as Flash Attention and Multi-query Attention. A different line of work aims to design new mechanisms to replace attention. Recently, a notable model structure Mamba, which is based on state space models, has achieved transformer-equivalent performance in multiple sequence modeling tasks. In this work, we examine Mamba's efficacy through the lens of a classical IR task\, -- \,document ranking. A reranker model takes a query and a document as input, and predicts a scalar relevance score. This task demands the language model's ability to comprehend lengthy contextual inputs and to capture the interaction between query and document tokens. We find that \textbf{(1) Mamba models achieve competitive performance compared to transformer-based models with the same training recipe; (2) but also have a lower training throughput in comparison to efficient transformer implementations such as flash attention.} We hope this study can serve as a starting point to explore \mamba models in other classical IR tasks. Our \href{https://github.com/zhichaoxu-shufe/RankMamba}{code implementation} is made public to facilitate reproducibility. Refer to~\cite{xu-etal-2025-state} for more comprehensive experiments and results, including passage ranking.

翻译：Transformer架构在多个应用机器学习领域取得了巨大成功，例如自然语言处理（NLP）、计算机视觉（CV）和信息检索（IR）。Transformer架构的核心机制——注意力机制，在训练时需要$O(n^2)$的时间复杂度，在推理时需要$O(n)$的时间复杂度。已有许多工作致力于改进注意力机制的可扩展性，例如Flash Attention和Multi-query Attention。另一类研究工作则旨在设计新的机制来替代注意力。最近，一种基于状态空间模型的显著模型结构Mamba，在多项序列建模任务中取得了与Transformer相当的性能。在本工作中，我们通过一个经典的IR任务——文档排序，来检验Mamba的有效性。重排序模型以查询和文档作为输入，并预测一个标量相关性分数。该任务要求语言模型具备理解长上下文输入以及捕捉查询与文档词元之间交互的能力。我们发现：\textbf{（1）在采用相同训练方案的情况下，Mamba模型达到了与基于Transformer的模型相竞争的性能；（2）但与高效的Transformer实现（如Flash Attention）相比，其训练吞吐量较低。}我们希望本研究能够作为探索Mamba模型在其他经典IR任务中应用的起点。我们的\href{https://github.com/zhichaoxu-shufe/RankMamba}{代码实现}已公开，以促进可复现性。更全面的实验和结果（包括段落排序）请参见~\cite{xu-etal-2025-state}。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。