Remote sensing change detection (RSCD) is a complex task, where changes often appear at different scales and orientations. Convolutional neural networks (CNNs) are good at capturing local spatial patterns but cannot model global semantics due to limited receptive fields. Alternatively, transformers can model long-range dependencies but are data hungry, and RSCD datasets are not large enough to train these models effectively. To tackle this, this paper presents a new architecture for RSCD which adapts a segment anything (SAM) vision foundation model and processes features from the SAM encoder through a multi-receptive field ensemble to capture local and global change patterns. We propose an ensemble of spatial-temporal feature enhancement (STFE) to capture cross-temporal relations, a decoder to reconstruct change patterns, and a multi-scale decoder fusion with attention (MSDFA) to fuse multi-scale decoder information and highlight key change patterns. Each branch in an ensemble operates on a separate receptive field to capture finer-to-coarser level details. Additionally, we propose a novel cross-entropy masking (CEM) loss to handle class-imbalance in RSCD datasets. Our work outperforms state-of-the-art (SOTA) methods on four change detection datasets, Levir-CD, WHU-CD, CLCD, and S2Looking. We achieved 2.97\% F1-score improvement on a complex S2Looking dataset. The code is available at: https://github.com/humza909/SAM-ECEM
翻译:遥感变化检测(RSCD)是一项复杂的任务,其中变化常以不同尺度和方向出现。卷积神经网络(CNN)擅长捕捉局部空间模式,但由于感受野有限,无法建模全局语义。相比之下,Transformer能够建模长程依赖关系,但其数据需求量大,而RSCD数据集规模通常不足以有效训练此类模型。为解决这一问题,本文提出了一种新的RSCD架构,该架构适配了Segment Anything(SAM)视觉基础模型,并通过多感受野集成处理SAM编码器的特征,以捕捉局部与全局变化模式。我们提出了一种时空特征增强(STFE)集成来捕捉跨时序关系,一个解码器用于重建变化模式,以及一种带注意力的多尺度解码器融合(MSDFA)方法,以融合多尺度解码器信息并突出关键变化模式。集成中的每个分支在独立的感受野上操作,以捕捉从细粒度到粗粒度的细节。此外,我们提出了一种新颖的交叉熵掩码(CEM)损失函数来处理RSCD数据集中的类别不平衡问题。我们的方法在四个变化检测数据集(Levir-CD、WHU-CD、CLCD和S2Looking)上均优于现有最先进(SOTA)方法。在复杂的S2Looking数据集上,我们实现了2.97%的F1分数提升。代码发布于:https://github.com/humza909/SAM-ECEM