RS-Mamba for Large Remote Sensing Image Dense Prediction

Context modeling is critical for remote sensing image dense prediction tasks. Nowadays, the growing size of very-high-resolution (VHR) remote sensing images poses challenges in effectively modeling context. While transformer-based models possess global modeling capabilities, they encounter computational challenges when applied to large VHR images due to their quadratic complexity. The conventional practice of cropping large images into smaller patches results in a notable loss of contextual information. To address these issues, we propose the Remote Sensing Mamba (RSM) for dense prediction tasks in large VHR remote sensing images. RSM is specifically designed to capture the global context of remote sensing images with linear complexity, facilitating the effective processing of large VHR images. Considering that the land covers in remote sensing images are distributed in arbitrary spatial directions due to characteristics of remote sensing over-head imaging, the RSM incorporates an omnidirectional selective scan module to globally model the context of images in multiple directions, capturing large spatial features from various directions. Extensive experiments on semantic segmentation and change detection tasks across various land covers demonstrate the effectiveness of the proposed RSM. We designed simple yet effective models based on RSM, achieving state-of-the-art performance on dense prediction tasks in VHR remote sensing images without fancy training strategies. Leveraging the linear complexity and global modeling capabilities, RSM achieves better efficiency and accuracy than transformer-based models on large remote sensing images. Interestingly, we also demonstrated that our model generally performs better with a larger image size on dense prediction tasks. Our code is available at https://github.com/walking-shadow/Official_Remote_Sensing_Mamba.

翻译：上下文建模对于遥感图像的密集预测任务至关重要。当前，超高分辨率（VHR）遥感图像规模的不断增长，给有效建模上下文带来了挑战。尽管基于Transformer的模型具备全局建模能力，但由于其二次复杂度，在处理大尺度VHR图像时面临计算难题。传统上将大图像裁剪成较小图像块的做法会导致上下文信息的显著丢失。为解决这些问题，我们提出遥感Mamba（RSM）用于大尺度VHR遥感图像的密集预测任务。RSM专为以线性复杂度捕获遥感图像全局上下文而设计，从而有效处理大尺度VHR图像。考虑到遥感图像中地物因遥感俯视成像特性而沿任意空间方向分布，RSM引入全向选择性扫描模块，从多方向对图像上下文进行全局建模，捕获来自不同方向的大尺度空间特征。针对多种地物的语义分割和变化检测任务进行的广泛实验验证了所提RSM的有效性。我们基于RSM设计了简洁高效的模型，在不依赖复杂训练策略的情况下，在VHR遥感图像密集预测任务中达到了最优性能。凭借线性复杂度和全局建模能力，RSM在处理大尺度遥感图像时比基于Transformer的模型具有更高的效率和准确性。有趣的是，我们还证明，在密集预测任务中，我们的模型通常在大图像尺寸下表现更优。代码已开源在https://github.com/walking-shadow/Official_Remote_Sensing_Mamba。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日