State Space Models are Strong Text Rerankers

Transformers dominate NLP and IR; but their inference inefficiencies and challenges in extrapolating to longer contexts have sparked interest in alternative model architectures. Among these, state space models (SSMs) like Mamba offer promising advantages, particularly $O(1)$ time complexity in inference. Despite their potential, SSMs' effectiveness at text reranking -- a task requiring fine-grained query-document interaction and long-context understanding -- remains underexplored. This study benchmarks SSM-based architectures (specifically, Mamba-1 and Mamba-2) against transformer-based models across various scales, architectures, and pre-training objectives, focusing on performance and efficiency in text reranking tasks. We find that (1) Mamba architectures achieve competitive text ranking performance, comparable to transformer-based models of similar size; (2) they are less efficient in training and inference compared to transformers with flash attention; and (3) Mamba-2 outperforms Mamba-1 in both performance and efficiency. These results underscore the potential of state space models as a transformer alternative and highlight areas for improvement in future IR applications.

翻译：Transformer 模型主导了自然语言处理和信息检索领域；但其推理效率低下以及在处理更长上下文时外推能力面临的挑战，引发了对替代模型架构的兴趣。其中，像 Mamba 这样的状态空间模型展现出显著优势，特别是其推理时具有 $O(1)$ 的时间复杂度。尽管潜力巨大，状态空间模型在文本重排序任务上的有效性——该任务需要细粒度的查询-文档交互和长上下文理解——仍未得到充分探索。本研究以文本重排序任务的性能和效率为重点，在不同规模、架构和预训练目标下，对基于状态空间模型的架构（特别是 Mamba-1 和 Mamba-2）与基于 Transformer 的模型进行了基准测试。我们发现：（1）Mamba 架构实现了具有竞争力的文本排序性能，与相似规模的 Transformer 模型相当；（2）与采用 Flash Attention 的 Transformer 相比，其在训练和推理方面的效率较低；（3）Mamba-2 在性能和效率上均优于 Mamba-1。这些结果凸显了状态空间模型作为 Transformer 替代方案的潜力，并指出了未来信息检索应用中需要改进的方向。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日