Convolutional neural networks (CNN) and Transformers have made impressive progress in the field of remote sensing change detection (CD). However, both architectures have their inherent shortcomings. Recently, the Mamba architecture, based on spatial state models, has shown remarkable performance in a series of natural language processing tasks, which can effectively compensate for the shortcomings of the above two architectures. In this paper, we explore for the first time the potential of the Mamba architecture for remote sensing change detection tasks. We tailor the corresponding frameworks, called MambaBCD, MambaSCD, and MambaBDA, for binary change detection (BCD), semantic change detection (SCD), and building damage assessment (BDA), respectively. All three frameworks adopt the cutting-edge visual Mamba architecture as the encoder, which allows full learning of global spatial contextual information from the input images. For the change decoder, which is available in all three architectures, we propose three spatio-temporal relationship modeling mechanisms, which can be naturally combined with the Mamba architecture and fully utilize its attribute to achieve spatio-temporal interaction of multi-temporal features and obtain accurate change information. On five benchmark datasets, our proposed frameworks outperform current CNN- and Transformer-based approaches without using any complex strategies or tricks, fully demonstrating the potential of the Mamba architecture. Specifically, we obtained 83.11%, 88.39% and 94.19% F1 scores on the three BCD datasets SYSU, LEVIR-CD+, and WHU-CD; on the SCD dataset SECOND, we obtained 24.04% SeK; and on the xBD dataset, we obtained 81.41% overall F1 score. The source code will be available in https://github.com/ChenHongruixuan/MambaCD
翻译:卷积神经网络(CNN)与Transformer在遥感变化检测(CD)领域取得了显著进展,但两种架构均存在固有缺陷。近期,基于空间状态模型的Mamba架构在一系列自然语言处理任务中展现出卓越性能,可有效弥补上述两种架构的不足。本文首次探索了Mamba架构在遥感变化检测任务中的潜力,并分别为二元变化检测(BCD)、语义变化检测(SCD)和建筑物损伤评估(BDA)定制了相应框架,命名为MambaBCD、MambaSCD与MambaBDA。三个框架均采用前沿的视觉Mamba架构作为编码器,能够从输入图像中全面学习全局空间上下文信息。针对三种架构共有的变化解码器,我们提出了三种时空关系建模机制,这些机制可自然融入Mamba架构并充分利用其特性,实现多时相特征的时空交互,从而获取精准的变化信息。在五个基准数据集上,我们所提出的框架无需采用任何复杂策略或技巧,即优于当前基于CNN和Transformer的方法,充分展现了Mamba架构的潜力。具体而言,在三个BCD数据集SYSU、LEVIR-CD+和WHU-CD上分别获得了83.11%、88.39%和94.19%的F1分数;在SCD数据集SECOND上获得了24.04%的SeK指标;在xBD数据集上获得了81.41%的整体F1分数。源代码将在https://github.com/ChenHongruixuan/MambaCD 提供。