Diffusion models and multi-scale features are essential components in semantic segmentation tasks that deal with remote-sensing images. They contribute to improved segmentation boundaries and offer significant contextual information. U-net-like architectures are frequently employed in diffusion models for segmentation tasks. These architectural designs include dense skip connections that may pose challenges for interpreting intermediate features. Consequently, they might not efficiently convey semantic information throughout various layers of the encoder-decoder architecture. To address these challenges, we propose a new model for semantic segmentation known as the diffusion model with parallel multi-scale branches. This model consists of Parallel Multiscale Diffusion modules (P-MSDiff) and a Cross-Bridge Linear Attention mechanism (CBLA). P-MSDiff enhances the understanding of semantic information across multiple levels of granularity and detects repetitive distribution data through the integration of recursive denoising branches. It further facilitates the amalgamation of data by connecting relevant branches to the primary framework to enable concurrent denoising. Furthermore, within the interconnected transformer architecture, the LA module has been substituted with the CBLA module. This module integrates a semidefinite matrix linked to the query into the dot product computation of keys and values. This integration enables the adaptation of queries within the LA framework. This adjustment enhances the structure for multi-head attention computation, leading to enhanced network performance and CBLA is a plug-and-play module. Our model demonstrates superior performance based on the J1 metric on both the UAVid and Vaihingen Building datasets, showing improvements of 1.60% and 1.40% over strong baseline models, respectively.
翻译:扩散模型与多尺度特征是处理遥感图像的语义分割任务中的关键组成部分,它们有助于改善分割边界并提供重要的上下文信息。在用于分割任务的扩散模型中,常采用类U-net架构。这类架构设计包含密集跳跃连接,可能对中间特征的解释带来挑战,因而可能无法在编码器-解码器架构的各层间高效传递语义信息。为解决这些问题,我们提出了一种新的语义分割模型,即具有并行多尺度分支的扩散模型。该模型由并行多尺度扩散模块(P-MSDiff)和跨桥线性注意力机制(CBLA)构成。P-MSDiff通过集成递归去噪分支,增强了对多粒度层次语义信息的理解,并检测重复分布数据;它通过将相关分支连接到主框架以实现并行去噪,进一步促进了数据融合。此外,在相互连接的Transformer架构中,线性注意力模块已被替换为CBLA模块。该模块将与查询相关的半正定矩阵集成到键与值的点积计算中,使得查询能够在LA框架内自适应调整。这一调整优化了多头注意力计算的结构,从而提升了网络性能,且CBLA是一个即插即用模块。我们的模型在UAVid和Vaihingen Building数据集上基于J1指标均表现出优越性能,分别较强基线模型提升了1.60%和1.40%。