The semantic segmentation of high-resolution remote sensing images plays a crucial role in downstream applications such as urban planning and disaster assessment. However, existing Transformer-based methods suffer from the constraint between accuracy and efficiency. To overcome this dilemma, we propose UNetMamba, a novel Mamba-based semantic segmentation model. It incorporates a Mamba Segmentation Decoder (MSD) that can efficiently decode the complex information within high-resolution images, and a Local Supervision Module (LSM), which is train-only but can significantly enhance the perception of local contents. Extensive experiments demonstrate that UNet-Mamba outperforms the state-of-the-art methods with the mIoU increased by 0.87% on LoveDA and 0.36% on ISPRS Vaihingen, while achieving high efficiency through light weight, low memory footprint and low computational cost. The source code will soon be publicly available at https://github.com/EnzeZhu2001/UNetMamba.
翻译:高分辨率遥感图像的语义分割在城市规划、灾害评估等下游应用中起着至关重要的作用。然而,现有的基于Transformer的方法受限于精度与效率之间的权衡。为克服这一困境,我们提出了UNetMamba,一种新颖的基于Mamba的语义分割模型。它包含一个能够高效解码高分辨率图像内复杂信息的Mamba分割解码器(MSD),以及一个仅在训练时使用但能显著增强局部内容感知能力的局部监督模块(LSM)。大量实验表明,UNetMamba在LoveDA数据集上的mIoU提升了0.87%,在ISPRS Vaihingen数据集上提升了0.36%,优于现有最先进方法,同时通过轻量化、低内存占用和低计算成本实现了高效率。源代码将很快在 https://github.com/EnzeZhu2001/UNetMamba 公开提供。