Context modeling is critical for remote sensing image dense prediction tasks. Nowadays, the growing size of very-high-resolution (VHR) remote sensing images poses challenges in effectively modeling context. While transformer-based models possess global modeling capabilities, they encounter computational challenges when applied to large VHR images due to their quadratic complexity. The conventional practice of cropping large images into smaller patches results in a notable loss of contextual information. To address these issues, we propose the Remote Sensing Mamba (RSM) for dense prediction tasks in large VHR remote sensing images. RSM is specifically designed to capture the global context of remote sensing images with linear complexity, facilitating the effective processing of large VHR images. Considering that the land covers in remote sensing images are distributed in arbitrary spatial directions due to characteristics of remote sensing over-head imaging, the RSM incorporates an omnidirectional selective scan module to globally model the context of images in multiple directions, capturing large spatial features from various directions. Extensive experiments on semantic segmentation and change detection tasks across various land covers demonstrate the effectiveness of the proposed RSM. We designed simple yet effective models based on RSM, achieving state-of-the-art performance on dense prediction tasks in VHR remote sensing images without fancy training strategies. Leveraging the linear complexity and global modeling capabilities, RSM achieves better efficiency and accuracy than transformer-based models on large remote sensing images. Interestingly, we also demonstrated that our model generally performs better with a larger image size on dense prediction tasks. Our code is available at https://github.com/walking-shadow/Official_Remote_Sensing_Mamba.
翻译:上下文建模对于遥感图像的密集预测任务至关重要。当前,超高分辨率(VHR)遥感图像规模的不断增长,给有效建模上下文带来了挑战。尽管基于Transformer的模型具备全局建模能力,但由于其二次复杂度,在处理大尺度VHR图像时面临计算难题。传统上将大图像裁剪成较小图像块的做法会导致上下文信息的显著丢失。为解决这些问题,我们提出遥感Mamba(RSM)用于大尺度VHR遥感图像的密集预测任务。RSM专为以线性复杂度捕获遥感图像全局上下文而设计,从而有效处理大尺度VHR图像。考虑到遥感图像中地物因遥感俯视成像特性而沿任意空间方向分布,RSM引入全向选择性扫描模块,从多方向对图像上下文进行全局建模,捕获来自不同方向的大尺度空间特征。针对多种地物的语义分割和变化检测任务进行的广泛实验验证了所提RSM的有效性。我们基于RSM设计了简洁高效的模型,在不依赖复杂训练策略的情况下,在VHR遥感图像密集预测任务中达到了最优性能。凭借线性复杂度和全局建模能力,RSM在处理大尺度遥感图像时比基于Transformer的模型具有更高的效率和准确性。有趣的是,我们还证明,在密集预测任务中,我们的模型通常在大图像尺寸下表现更优。代码已开源在https://github.com/walking-shadow/Official_Remote_Sensing_Mamba。