Long-range dependency modeling has been widely considered in modern deep learning based semantic segmentation methods, especially those designed for large-size remote sensing images, to compensate the intrinsic locality of standard convolutions. However, in previous studies, the long-range dependency, modeled with an attention mechanism or transformer model, has been based on unsupervised learning, instead of explicit supervision from the objective ground truth. In this paper, we propose a novel supervised long-range correlation method for land-cover classification, called the supervised long-range correlation network (SLCNet), which is shown to be superior to the currently used unsupervised strategies. In SLCNet, pixels sharing the same category are considered highly correlated and those having different categories are less relevant, which can be easily supervised by the category consistency information available in the ground truth semantic segmentation map. Under such supervision, the recalibrated features are more consistent for pixels of the same category and more discriminative for pixels of other categories, regardless of their proximity. To complement the detailed information lacking in the global long-range correlation, we introduce an auxiliary adaptive receptive field feature extraction module, parallel to the long-range correlation module in the encoder, to capture finely detailed feature representations for multi-size objects in multi-scale remote sensing images. In addition, we apply multi-scale side-output supervision and a hybrid loss function as local and global constraints to further boost the segmentation accuracy. Experiments were conducted on three remote sensing datasets. Compared with the advanced segmentation methods from the computer vision, medicine, and remote sensing communities, the SLCNet achieved a state-of-the-art performance on all the datasets.
翻译:长程依赖建模已被广泛应用于现代基于深度学习的语义分割方法,尤其是针对大尺寸遥感图像设计的模型,以弥补标准卷积在局部性上的固有缺陷。然而,以往研究中采用注意力机制或Transformer模型建模的长程依赖均基于无监督学习,而非来自目标真值的显式监督。本文提出一种新颖的有监督长程相关性方法用于土地覆盖分类,称为监督长程相关网络(SLCNet),其性能优于当前使用的无监督策略。在SLCNet中,共享相同类别的像素被视为高度相关,而不同类别的像素相关性较低——这一特性可借助真实语义分割图中的类别一致性信息轻松实现监督。在此监督下,重校准后的特征对同一类别像素更一致,对跨类别像素更具区分性,且不受像素间空间距离的影响。为弥补全局长程相关性缺乏细节信息的不足,我们在编码器中并行引入辅助自适应感受野特征提取模块,以捕获多尺度遥感图像中多尺寸物体的精细特征表示。此外,我们应用多尺度侧输出监督与混合损失函数作为局部和全局约束,进一步提升分割精度。在三个遥感数据集上的实验表明,与计算机视觉、医学及遥感领域的先进分割方法相比,SLCNet在所有数据集上均实现了最先进的性能。