Transformers dominate NLP and IR; but their inference inefficiencies and challenges in extrapolating to longer contexts have sparked interest in alternative model architectures. Among these, state space models (SSMs) like Mamba offer promising advantages, particularly $O(1)$ time complexity in inference. Despite their potential, SSMs' effectiveness at text reranking -- a task requiring fine-grained query-document interaction and long-context understanding -- remains underexplored. This study benchmarks SSM-based architectures (specifically, Mamba-1 and Mamba-2) against transformer-based models across various scales, architectures, and pre-training objectives, focusing on performance and efficiency in text reranking tasks. We find that (1) Mamba architectures achieve competitive text ranking performance, comparable to transformer-based models of similar size; (2) they are less efficient in training and inference compared to transformers with flash attention; and (3) Mamba-2 outperforms Mamba-1 in both performance and efficiency. These results underscore the potential of state space models as a transformer alternative and highlight areas for improvement in future IR applications.
翻译:Transformer 模型主导了自然语言处理和信息检索领域;但其推理效率低下以及在处理更长上下文时外推能力面临的挑战,引发了对替代模型架构的兴趣。其中,像 Mamba 这样的状态空间模型展现出显著优势,特别是其推理时具有 $O(1)$ 的时间复杂度。尽管潜力巨大,状态空间模型在文本重排序任务上的有效性——该任务需要细粒度的查询-文档交互和长上下文理解——仍未得到充分探索。本研究以文本重排序任务的性能和效率为重点,在不同规模、架构和预训练目标下,对基于状态空间模型的架构(特别是 Mamba-1 和 Mamba-2)与基于 Transformer 的模型进行了基准测试。我们发现:(1)Mamba 架构实现了具有竞争力的文本排序性能,与相似规模的 Transformer 模型相当;(2)与采用 Flash Attention 的 Transformer 相比,其在训练和推理方面的效率较低;(3)Mamba-2 在性能和效率上均优于 Mamba-1。这些结果凸显了状态空间模型作为 Transformer 替代方案的潜力,并指出了未来信息检索应用中需要改进的方向。