Zero-Shot Refinement of Buildings' Segmentation Models using SAM

Foundation models have excelled in various tasks but are often evaluated on general benchmarks. The adaptation of these models for specific domains, such as remote sensing imagery, remains an underexplored area. In remote sensing, precise building instance segmentation is vital for applications like urban planning. While Convolutional Neural Networks (CNNs) perform well, their generalization can be limited. For this aim, we present a novel approach to adapt foundation models to address existing models' generalization dropback. Among several models, our focus centers on the Segment Anything Model (SAM), a potent foundation model renowned for its prowess in class-agnostic image segmentation capabilities. We start by identifying the limitations of SAM, revealing its suboptimal performance when applied to remote sensing imagery. Moreover, SAM does not offer recognition abilities and thus fails to classify and tag localized objects. To address these limitations, we introduce different prompting strategies, including integrating a pre-trained CNN as a prompt generator. This novel approach augments SAM with recognition abilities, a first of its kind. We evaluated our method on three remote sensing datasets, including the WHU Buildings dataset, the Massachusetts Buildings dataset, and the AICrowd Mapping Challenge. For out-of-distribution performance on the WHU dataset, we achieve a 5.47\% increase in IoU and a 4.81\% improvement in F1-score. For in-distribution performance on the WHU dataset, we observe a 2.72\% and 1.58\% increase in True-Positive-IoU and True-Positive-F1 score, respectively. Our code is publicly available at this Repo (https://github.com/geoaigroup/GEOAI-ECRS2023), hoping to inspire further exploration of foundation models for domain-specific tasks within the remote sensing community.

翻译：基础模型在各类任务中表现出色，但通常仅在通用基准上进行评估。如何将这些模型适配到特定领域（如遥感影像）仍是一个未被充分探索的方向。在遥感领域，精确的建筑物实例分割对城市规划等应用至关重要。尽管卷积神经网络（CNN）表现良好，但其泛化能力可能存在局限。为此，我们提出一种新型方法，通过适配基础模型来解决现有模型泛化性能下降的问题。在众多模型中，我们重点关注分割一切模型（SAM）——这一以类无关图像分割能力著称的强大基础模型。首先，我们揭示了SAM在遥感影像上的局限性，发现其在此场景中表现欠佳。此外，SAM不具备识别能力，因此无法对定位目标进行分类与标记。为克服这些不足，我们引入多种提示策略，包括将预训练CNN作为提示生成器进行集成。这种创新方法首次赋予SAM识别能力。我们在三个遥感数据集（包括WHU建筑物数据集、马萨诸塞州建筑物数据集及AICrowd制图挑战赛）上评估了该方法。针对WHU数据集的分布外性能，我们实现了IoU提升5.47%、F1分数提升4.81%；针对该数据集的分布内性能，真阳性IoU和真阳性F1分数分别提升2.72%和1.58%。我们的代码已开源至该仓库（https://github.com/geoaigroup/GEOAI-ECRS2023），希望推动遥感社区进一步探索基础模型在领域特定任务中的应用。