The spatial resolution of remote sensing images is becoming increasingly higher, posing challenges in handling large very-high-resolution (VHR) remote sensing images for dense prediction tasks. Models based on convolutional neural networks are limited in their ability to model global features of remote sensing images due to local convolution operations. Transformer based models, despite their global modeling capabilities, face computational challenges with large VHR images due to their quadratic complexity. The common practice of cropping large images into smaller patches leads to a significant loss of contextual information. To address these issues, we propose the Remote Sensing Mamba (RSM) for dense prediction tasks in VHR remote sensing. RSM is designed to model global features of remote sensing images with linear complexity, enabling it to process large VHR images effectively. It employs an omnidirectional selective scan module to globally model the images in multiple directions, capturing large spatial features from various directions. Experiments on semantic segmentation and change detection tasks across various objects demonstrate the effectiveness of RSM. With simple model architecture and training approach, RSM achieves state-of-the-art performance on the dense prediction tasks of VHR remote sensing. The code for this work will be available at https://github.com/walking-shadow/Official_Remote_Sensing_Mamba.
翻译:随着遥感图像空间分辨率日益提高,大范围甚高分辨率遥感图像的稠密预测任务面临巨大挑战。基于卷积神经网络的模型受限于局部卷积操作,难以有效建模遥感图像的全局特征。基于Transformer的模型虽具备全局建模能力,但由于其二次复杂度,在处理大范围甚高分辨率图像时面临计算挑战。将大图像裁剪成小块的常规做法会导致上下文信息的显著丢失。为解决这些问题,我们提出遥感Mamba模型用于甚高分辨率遥感图像稠密预测。该模型以线性复杂度建模遥感图像全局特征,能够高效处理大范围甚高分辨率图像。其采用全向选择性扫描模块对图像进行多方向全局建模,从不同方向捕获大范围空间特征。在多种地物类型的语义分割和变化检测任务上的实验证明了该模型的有效性。凭借简洁的模型架构和训练方法,该模型在甚高分辨率遥感图像稠密预测任务中达到领先性能。本工作代码将发布在 https://github.com/walking-shadow/Official_Remote_Sensing_Mamba。