Hi-ResNet: A High-Resolution Remote Sensing Network for Semantic Segmentation

High-resolution remote sensing (HRS) semantic segmentation extracts key objects from high-resolution coverage areas. However, objects of the same category within HRS images generally show significant differences in scale and shape across diverse geographical environments, making it difficult to fit the data distribution. Additionally, a complex background environment causes similar appearances of objects of different categories, which precipitates a substantial number of objects into misclassification as background. These issues make existing learning algorithms sub-optimal. In this work, we solve the above-mentioned problems by proposing a High-resolution remote sensing network (Hi-ResNet) with efficient network structure designs, which consists of a funnel module, a multi-branch module with stacks of information aggregation (IA) blocks, and a feature refinement module, sequentially, and Class-agnostic Edge Aware (CEA) loss. Specifically, we propose a funnel module to downsample, which reduces the computational cost, and extract high-resolution semantic information from the initial input image. Secondly, we downsample the processed feature images into multi-resolution branches incrementally to capture image features at different scales and apply IA blocks, which capture key latent information by leveraging attention mechanisms, for effective feature aggregation, distinguishing image features of the same class with variant scales and shapes. Finally, our feature refinement module integrate the CEA loss function, which disambiguates inter-class objects with similar shapes and increases the data distribution distance for correct predictions. With effective pre-training strategies, we demonstrated the superiority of Hi-ResNet over state-of-the-art methods on three HRS segmentation benchmarks.

翻译：高分辨率遥感（HRS）语义分割从高分辨率覆盖区域中提取关键地物。然而，HRS图像中同一类别的地物在不同地理环境中通常存在显著的尺度和形状差异，导致难以拟合数据分布。此外，复杂的背景环境使得不同类别的地物外观相似，进而引发大量地物被误分类为背景。上述问题导致现有学习算法性能欠佳。本研究通过提出一种具有高效网络结构设计的高分辨率遥感网络（Hi-ResNet）来解决上述问题，该网络依次包含漏斗模块、由多个信息聚合（IA）块堆叠而成的多分支模块、特征细化模块以及类别无关边缘感知（CEA）损失函数。具体而言，我们提出下采样漏斗模块以降低计算成本，并从初始输入图像中提取高分辨率语义信息；其次，将处理后的特征图像逐步下采样为多分辨率分支，以捕获不同尺度的图像特征，并应用IA块（利用注意力机制捕获关键潜在信息）实现高效特征聚合，从而区分同一类别中尺度和形状各异的图像特征；最后，特征细化模块集成CEA损失函数，通过消除形状相似的类别间地物歧义性并增大数据分布距离以实现正确预测。借助有效的预训练策略，我们在三个HRS语义分割基准上展示了Hi-ResNet相较于现有最优方法的优越性。