Clouds in remote sensing images inevitably affect information extraction, which hinder the following analysis of satellite images. Hence, cloud detection is a necessary preprocessing procedure. However, the existing methods have numerous calculations and parameters. In this letter, a lightweight CNN-Transformer network, CD-CTFM, is proposed to solve the problem. CD-CTFM is based on encoder-decoder architecture and incorporates the attention mechanism. In the decoder part, we utilize a lightweight network combing CNN and Transformer as backbone, which is conducive to extract local and global features simultaneously. Moreover, a lightweight feature pyramid module is designed to fuse multiscale features with contextual information. In the decoder part, we integrate a lightweight channel-spatial attention module into each skip connection between encoder and decoder, extracting low-level features while suppressing irrelevant information without introducing many parameters. Finally, the proposed model is evaluated on two cloud datasets, 38-Cloud and MODIS. The results demonstrate that CD-CTFM achieves comparable accuracy as the state-of-art methods. At the same time, CD-CTFM outperforms state-of-art methods in terms of efficiency.
翻译:遥感图像中的云层会不可避免地影响信息提取,从而阻碍卫星图像的后续分析。因此,云检测是一项必要的预处理步骤。然而,现有方法存在计算量与参数规模过大的问题。本文提出一种轻量级CNN-Transformer网络CD-CTFM来解决该问题。CD-CTFM基于编码器-解码器架构,并融合了注意力机制。在解码器部分,我们采用结合CNN与Transformer的轻量级网络作为主干网络,这有助于同时提取局部与全局特征。此外,我们设计了一个轻量级特征金字塔模块,用于融合包含上下文信息的多尺度特征。在解码器部分,我们将轻量级通道-空间注意力模块集成到编码器与解码器之间的每个跳跃连接中,在提取底层特征的同时抑制无关信息,且不引入过多参数。最后,在两个云数据集(38-Cloud与MODIS)上对提出的模型进行了评估。结果表明,CD-CTFM在精度上达到了与现有最优方法相当的水平,同时在效率上优于现有最优方法。