P-MSDiff: Parallel Multi-Scale Diffusion for Remote Sensing Image Segmentation

Diffusion models and multi-scale features are essential components in semantic segmentation tasks that deal with remote-sensing images. They contribute to improved segmentation boundaries and offer significant contextual information. U-net-like architectures are frequently employed in diffusion models for segmentation tasks. These architectural designs include dense skip connections that may pose challenges for interpreting intermediate features. Consequently, they might not efficiently convey semantic information throughout various layers of the encoder-decoder architecture. To address these challenges, we propose a new model for semantic segmentation known as the diffusion model with parallel multi-scale branches. This model consists of Parallel Multiscale Diffusion modules (P-MSDiff) and a Cross-Bridge Linear Attention mechanism (CBLA). P-MSDiff enhances the understanding of semantic information across multiple levels of granularity and detects repetitive distribution data through the integration of recursive denoising branches. It further facilitates the amalgamation of data by connecting relevant branches to the primary framework to enable concurrent denoising. Furthermore, within the interconnected transformer architecture, the LA module has been substituted with the CBLA module. This module integrates a semidefinite matrix linked to the query into the dot product computation of keys and values. This integration enables the adaptation of queries within the LA framework. This adjustment enhances the structure for multi-head attention computation, leading to enhanced network performance and CBLA is a plug-and-play module. Our model demonstrates superior performance based on the J1 metric on both the UAVid and Vaihingen Building datasets, showing improvements of 1.60% and 1.40% over strong baseline models, respectively.

翻译：扩散模型与多尺度特征是处理遥感图像的语义分割任务中的关键组成部分，它们有助于改善分割边界并提供重要的上下文信息。在用于分割任务的扩散模型中，常采用类U-net架构。这类架构设计包含密集跳跃连接，可能对中间特征的解释带来挑战，因而可能无法在编码器-解码器架构的各层间高效传递语义信息。为解决这些问题，我们提出了一种新的语义分割模型，即具有并行多尺度分支的扩散模型。该模型由并行多尺度扩散模块（P-MSDiff）和跨桥线性注意力机制（CBLA）构成。P-MSDiff通过集成递归去噪分支，增强了对多粒度层次语义信息的理解，并检测重复分布数据；它通过将相关分支连接到主框架以实现并行去噪，进一步促进了数据融合。此外，在相互连接的Transformer架构中，线性注意力模块已被替换为CBLA模块。该模块将与查询相关的半正定矩阵集成到键与值的点积计算中，使得查询能够在LA框架内自适应调整。这一调整优化了多头注意力计算的结构，从而提升了网络性能，且CBLA是一个即插即用模块。我们的模型在UAVid和Vaihingen Building数据集上基于J1指标均表现出优越性能，分别较强基线模型提升了1.60%和1.40%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日