DeepMerge: Deep-Learning-Based Region-Merging for Image Segmentation

Image segmentation aims to partition an image according to the objects in the scene and is a fundamental step in analysing very high spatial-resolution (VHR) remote sensing imagery. Current methods struggle to effectively consider land objects with diverse shapes and sizes. Additionally, the determination of segmentation scale parameters frequently adheres to a static and empirical doctrine, posing limitations on the segmentation of large-scale remote sensing images and yielding algorithms with limited interpretability. To address the above challenges, we propose a deep-learning-based region merging method dubbed DeepMerge to handle the segmentation of complete objects in large VHR images by integrating deep learning and region adjacency graph (RAG). This is the first method to use deep learning to learn the similarity and merge similar adjacent super-pixels in RAG. We propose a modified binary tree sampling method to generate shift-scale data, serving as inputs for transformer-based deep learning networks, a shift-scale attention with 3-Dimension relative position embedding to learn features across scales, and an embedding to fuse learned features with hand-crafted features. DeepMerge can achieve high segmentation accuracy in a supervised manner from large-scale remotely sensed images and provides an interpretable optimal scale parameter, which is validated using a remote sensing image of 0.55 m resolution covering an area of 5,660 km^2. The experimental results show that DeepMerge achieves the highest F value (0.9550) and the lowest total error TE (0.0895), correctly segmenting objects of different sizes and outperforming all competing segmentation methods.

翻译：图像分割旨在根据场景中的物体对图像进行划分，是对甚高空间分辨率（VHR）遥感影像进行分析的基础步骤。现有方法难以有效处理具有不同形状和尺寸的地物，且分割尺度参数的确定常遵循静态经验原则，这限制了大尺度遥感影像的分割能力，并导致算法可解释性有限。为解决上述挑战，我们提出了一种基于深度学习的区域合并方法DeepMerge，通过融合深度学习与区域邻接图（RAG），实现对大尺度VHR影像中完整物体的分割。这是首次利用深度学习学习区域邻接图中相邻超像素的相似性并进行合并的方法。我们提出了改进的二叉树采样方法生成尺度迁移数据，作为基于Transformer的深度学习网络的输入；设计了具有三维相对位置嵌入的尺度迁移注意力机制以学习跨尺度特征；并引入嵌入层将学习特征与手工特征融合。DeepMerge能够以监督方式从大尺度遥感影像中实现高精度分割，并提供可解释的最优尺度参数，该性能通过覆盖5660平方公里、空间分辨率0.55米的遥感影像得以验证。实验结果表明，DeepMerge取得了最高的F值（0.9550）和最低的总误差TE（0.0895），能正确分割不同大小的物体，性能优于所有对比分割方法。