Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

Transformer-based segmentation methods face the challenge of efficient inference when dealing with high-resolution images. Recently, several linear attention architectures, such as Mamba and RWKV, have attracted much attention as they can process long sequences efficiently. In this work, we focus on designing an efficient segment-anything model by exploring these different architectures. Specifically, we design a mixed backbone that contains convolution and RWKV operation, which achieves the best for both accuracy and efficiency. In addition, we design an efficient decoder to utilize the multiscale tokens to obtain high-quality masks. We denote our method as RWKV-SAM, a simple, effective, fast baseline for SAM-like models. Moreover, we build a benchmark containing various high-quality segmentation datasets and jointly train one efficient yet high-quality segmentation model using this benchmark. Based on the benchmark results, our RWKV-SAM achieves outstanding performance in efficiency and segmentation quality compared to transformers and other linear attention models. For example, compared with the same-scale transformer model, RWKV-SAM achieves more than 2x speedup and can achieve better segmentation performance on various datasets. In addition, RWKV-SAM outperforms recent vision Mamba models with better classification and semantic segmentation results. Code and models will be publicly available.

翻译：基于Transformer的分割方法在处理高分辨率图像时面临高效推理的挑战。最近，几种线性注意力架构（如Mamba和RWKV）因其能高效处理长序列而备受关注。在本工作中，我们专注于通过探索这些不同架构来设计一种高效的任意分割模型。具体而言，我们设计了一个包含卷积和RWKV操作的混合骨干网络，在精度和效率上均达到最佳。此外，我们设计了一个高效解码器，以利用多尺度标记来获得高质量掩码。我们将该方法命名为RWKV-SAM，这是一个为类SAM模型设计的简单、有效、快速的基线。此外，我们构建了一个包含多种高质量分割数据集的基准测试集，并利用该基准联合训练了一个高效且高质量的分割模型。基于基准测试结果，与Transformer及其他线性注意力模型相比，我们的RWKV-SAM在效率和分割质量上均取得了卓越性能。例如，与同等规模的Transformer模型相比，RWKV-SAM实现了超过2倍的加速，并在多个数据集上取得了更好的分割性能。此外，RWKV-SAM在分类和语义分割结果上优于近期的视觉Mamba模型。代码与模型将公开提供。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日