Change detection in remote sensing imagery is essential for a variety of applications such as urban planning, disaster management, and climate research. However, existing methods for identifying semantically changed areas overlook the availability of semantic information in the form of existing maps describing features of the earth's surface. In this paper, we leverage this information for change detection in bi-temporal images. We show that the simple integration of the additional information via concatenation of latent representations suffices to significantly outperform state-of-the-art change detection methods. Motivated by this observation, we propose the new task of Conditional Change Detection, where pre-change semantic information is used as input next to bi-temporal images. To fully exploit the extra information, we propose MapFormer, a novel architecture based on a multi-modal feature fusion module that allows for feature processing conditioned on the available semantic information. We further employ a supervised, cross-modal contrastive loss to guide the learning of visual representations. Our approach outperforms existing change detection methods by an absolute 11.7% and 18.4% in terms of binary change IoU on DynamicEarthNet and HRSCD, respectively. Furthermore, we demonstrate the robustness of our approach to the quality of the pre-change semantic information and the absence pre-change imagery. The code will be made publicly available.
翻译:遥感影像中的变化检测对于城市规划、灾害管理和气候研究等多种应用至关重要。然而,现有用于识别语义变化区域的方法忽视了现有地图中描述地表特征语义信息的可用性。在本文中,我们利用这些信息进行双时相影像的变化检测。研究表明,仅通过连接潜在表示来简单整合额外信息,就能显著优于最先进的变化检测方法。基于这一发现,我们提出了一项新任务——条件变化检测,即将变更前语义信息与双时相影像共同作为输入。为充分挖掘额外信息,我们提出了MapFormer,这是一种基于多模态特征融合模块的新型架构,能够基于可用语义信息进行特征处理。我们进一步采用有监督的跨模态对比损失来指导视觉表征学习。我们的方法在DynamicEarthNet和HRSCD数据集上的二值变化交并比(IoU)分别绝对提升了11.7%和18.4%,优于现有变化检测方法。此外,我们证明了该方法对变更前语义信息质量及变更前图像缺失的鲁棒性。代码将公开提供。