In order to fully utilize spatial information for segmentation and address the challenge of handling areas with significant grayscale variations in remote sensing segmentation, we propose the SFFNet (Spatial and Frequency Domain Fusion Network) framework. This framework employs a two-stage network design: the first stage extracts features using spatial methods to obtain features with sufficient spatial details and semantic information; the second stage maps these features in both spatial and frequency domains. In the frequency domain mapping, we introduce the Wavelet Transform Feature Decomposer (WTFD) structure, which decomposes features into low-frequency and high-frequency components using the Haar wavelet transform and integrates them with spatial features. To bridge the semantic gap between frequency and spatial features, and facilitate significant feature selection to promote the combination of features from different representation domains, we design the Multiscale Dual-Representation Alignment Filter (MDAF). This structure utilizes multiscale convolutions and dual-cross attentions. Comprehensive experimental results demonstrate that, compared to existing methods, SFFNet achieves superior performance in terms of mIoU, reaching 84.80% and 87.73% respectively.The code is located at https://github.com/yysdck/SFFNet.
翻译:为充分利用空间信息进行分割,并解决遥感图像分割中灰度变化显著区域的难点,我们提出了SFFNet(空间与频率域融合网络)框架。该框架采用两阶段网络设计:第一阶段利用空间方法提取特征,获取包含充足空间细节与语义信息的特征;第二阶段在空间域和频率域同时对这些特征进行映射。在频率域映射中,我们引入了小波变换特征分解器(WTFD)结构,该结构使用哈尔小波变换将特征分解为低频和高频分量,并将其与空间特征融合。为弥合频率特征与空间特征之间的语义鸿沟,并促进显著特征选择以推动不同表示域特征的结合,我们设计了多尺度双表示对齐滤波器(MDAF)。该结构采用多尺度卷积与双重交叉注意力机制。综合实验结果表明,与现有方法相比,SFFNet在平均交并比(mIoU)指标上分别达到84.80%和87.73%,展现出更优性能。代码地址:https://github.com/yysdck/SFFNet。