Vision Foundation Models (VFMs) such as the Segment Anything Model (SAM) allow zero-shot or interactive segmentation of visual contents, thus they are quickly applied in a variety of visual scenes. However, their direct use in many Remote Sensing (RS) applications is often unsatisfactory due to the special imaging characteristics of RS images. In this work, we aim to utilize the strong visual recognition capabilities of VFMs to improve the change detection of high-resolution Remote Sensing Images (RSIs). We employ the visual encoder of FastSAM, an efficient variant of the SAM, to extract visual representations in RS scenes. To adapt FastSAM to focus on some specific ground objects in the RS scenes, we propose a convolutional adaptor to aggregate the task-oriented change information. Moreover, to utilize the semantic representations that are inherent to SAM features, we introduce a task-agnostic semantic learning branch to model the semantic latent in bi-temporal RSIs. The resulting method, SAMCD, obtains superior accuracy compared to the SOTA methods and exhibits a sample-efficient learning ability that is comparable to semi-supervised CD methods. To the best of our knowledge, this is the first work that adapts VFMs for the CD of HR RSIs.
翻译:视觉基础模型(VFMs),如Segment Anything模型(SAM),支持对视觉内容的零样本或交互式分割,因此被迅速应用于多种视觉场景。然而,由于遥感图像的特殊成像特性,这些模型在许多遥感应用中的直接使用往往效果不佳。本研究旨在利用VFMs强大的视觉识别能力,提升高分辨率遥感影像的变化检测性能。我们采用SAM的高效变体FastSAM的视觉编码器提取遥感场景中的视觉表征。为使FastSAM适配遥感场景中特定地物的关注,我们提出了一种卷积适配器,用于聚合任务导向的变化信息。此外,为利用SAM特征中固有的语义表征,我们引入了一个任务无关的语义学习分支,对双时相遥感影像中的语义隐变量进行建模。由此提出的方法SAMCD在精度上优于当前最优方法,并展现出与半监督变化检测方法相当的样本高效学习能力。据我们所知,这是首次将视觉基础模型适配用于高分辨率遥感影像变化检测的研究。