Owing to the diverse geographical environments, intricate landscapes, and high-density settlements, the automatic identification of urban village boundaries using remote sensing images is a highly challenging task. This paper proposes a novel and efficient neural network model called UV-Mamba for accurate boundary detection in high-resolution remote sensing images. UV-Mamba mitigates the memory loss problem in long sequence modeling, which arises in state space model (SSM) with increasing image size, by incorporating deformable convolutions (DCN). Its architecture utilizes an encoder-decoder framework, includes an encoder with four deformable state space augmentation (DSSA) blocks for efficient multi-level semantic extraction and a decoder to integrate the extracted semantic information. We conducted experiments on the Beijing and Xi'an datasets, and the results show that UV-Mamba achieves state-of-the-art performance. Specifically, our model achieves 73.3% and 78.1% IoU on the Beijing and Xi'an datasets, respectively, representing improvements of 1.2% and 3.4% IoU over the previous best model, while also being 6x faster in inference speed and 40x smaller in parameter count. Source code and pre-trained models are available in the supplementary material.
翻译:由于地理环境多样、景观复杂且聚居密度高,利用遥感图像自动识别城中村边界是一项极具挑战性的任务。本文提出了一种新颖高效的神经网络模型UV-Mamba,用于高分辨率遥感图像中的精确边界检测。UV-Mamba通过引入可变形卷积(DCN),缓解了状态空间模型(SSM)中随图像尺寸增大而出现的长序列建模记忆丢失问题。其架构采用编码器-解码器框架,包含一个具有四个可变形状态空间增强(DSSA)块的编码器,用于高效的多级语义提取,以及一个解码器来整合提取的语义信息。我们在北京和西安数据集上进行了实验,结果表明UV-Mamba实现了最先进的性能。具体而言,我们的模型在北京和西安数据集上分别达到了73.3%和78.1%的IoU,较先前最佳模型分别提升了1.2%和3.4%的IoU,同时推理速度快了6倍,参数量小了40倍。源代码和预训练模型可在补充材料中获取。