Vision Foundation Models (VFMs) such as the Segment Anything Model (SAM) allow zero-shot or interactive segmentation of visual contents, thus they are quickly applied in a variety of visual scenes. However, their direct use in many Remote Sensing (RS) applications is often unsatisfactory due to the special imaging characteristics of RS images. In this work, we aim to utilize the strong visual recognition capabilities of VFMs to improve the change detection of high-resolution Remote Sensing Images (RSIs). We employ the visual encoder of FastSAM, an efficient variant of the SAM, to extract visual representations in RS scenes. To adapt FastSAM to focus on some specific ground objects in the RS scenes, we propose a convolutional adaptor to aggregate the task-oriented change information. Moreover, to utilize the semantic representations that are inherent to SAM features, we introduce a task-agnostic semantic learning branch to model the semantic latent in bi-temporal RSIs. The resulting method, SAMCD, obtains superior accuracy compared to the SOTA methods and exhibits a sample-efficient learning ability that is comparable to semi-supervised CD methods. To the best of our knowledge, this is the first work that adapts VFMs for the CD of HR RSIs.
翻译:视觉基础模型(VFMs),如Segment Anything模型(SAM),能够实现视觉内容的零样本或交互式分割,因此被迅速应用于多种视觉场景。然而,由于遥感图像的特殊成像特性,这些模型在众多遥感应用中的直接使用往往不尽如人意。本研究旨在利用VFMs强大的视觉识别能力,提升高分辨率遥感图像的变化检测性能。我们采用FastSAM(SAM的一种高效变体)的视觉编码器,提取遥感场景中的视觉表征。为使FastSAM聚焦于遥感场景中的特定地物目标,我们提出了一种卷积适配器,用于聚合面向任务的变化信息。此外,为利用SAM特征中固有的语义表征,我们引入了一个任务无关的语义学习分支,以对双时相高分辨率遥感影像中的潜在语义进行建模。由此得到的方法SAMCD,在准确性上优于现有最优方法,并展现出与半监督变化检测方法相当的样本高效学习能力。据我们所知,这是首个将视觉基础模型适配于高分辨率遥感影像变化检测的工作。