Change detection in remote sensing imagery is essential for a variety of applications such as urban planning, disaster management, and climate research. However, existing methods for identifying semantically changed areas overlook the availability of semantic information in the form of existing maps describing features of the earth's surface. In this paper, we leverage this information for change detection in bi-temporal images. We show that the simple integration of the additional information via concatenation of latent representations suffices to significantly outperform state-of-the-art change detection methods. Motivated by this observation, we propose the new task of *Conditional Change Detection*, where pre-change semantic information is used as input next to bi-temporal images. To fully exploit the extra information, we propose *MapFormer*, a novel architecture based on a multi-modal feature fusion module that allows for feature processing conditioned on the available semantic information. We further employ a supervised, cross-modal contrastive loss to guide the learning of visual representations. Our approach outperforms existing change detection methods by an absolute 11.7\% and 18.4\% in terms of binary change IoU on DynamicEarthNet and HRSCD, respectively. Furthermore, we demonstrate the robustness of our approach to the quality of the pre-change semantic information and the absence pre-change imagery. The code is available at https://github.com/mxbh/mapformer.
翻译:遥感影像中变化检测对于城市规划、灾害管理及气候研究等多种应用至关重要。然而,现有识别语义变化区域的方法忽略了以描述地表特征现有地图形式呈现的语义信息的可用性。本文利用此类信息进行双时相影像变化检测,并证明仅通过潜在表示拼接整合额外信息即可显著优于现有最优变化检测方法。基于此发现,我们提出新任务*条件变化检测*,将变化前语义信息与双时相影像共同作为输入。为充分挖掘额外信息,我们提出MapFormer——一种基于多模态特征融合模块的新型架构,能够根据可用语义信息进行条件化特征处理。此外,我们采用有监督跨模态对比损失引导视觉表征学习。在DynamicEarthNet和HRSCD数据集上,本方法的二值变化交并比较现有方法分别绝对提升11.7%和18.4%。同时,我们证明了方法对变化前语义信息质量及变化前影像缺失的鲁棒性。代码开源于https://github.com/mxbh/mapformer。