The Segment Anything Model (SAM) has shown impressive performance when applied to natural image segmentation. However, it struggles with geographical images like aerial and satellite imagery, especially when segmenting mobility infrastructure including roads, sidewalks, and crosswalks. This inferior performance stems from the narrow features of these objects, their textures blending into the surroundings, and interference from objects like trees, buildings, vehicles, and pedestrians - all of which can disorient the model to produce inaccurate segmentation maps. To address these challenges, we propose Geographical SAM (GeoSAM), a novel SAM-based framework that implements a fine-tuning strategy using the dense visual prompt from zero-shot learning, and the sparse visual prompt from a pre-trained CNN segmentation model. The proposed GeoSAM outperforms existing approaches for geographical image segmentation, specifically by 26%, 7%, and 17% for road infrastructure, pedestrian infrastructure, and on average, respectively, representing a momentous leap in leveraging foundation models to segment mobility infrastructure including both road and pedestrian infrastructure in geographical images. The source code can be found on this GitHub repository: https://github.com/rafiibnsultan/GeoSAM/tree/main.
翻译:分割一切模型(SAM)在自然图像分割任务中表现出色,但在处理航空和卫星等地理图像时面临挑战,尤其在对道路、人行道及斑马线等出行基础设施进行分割时表现欠佳。这一性能缺陷源于目标对象特征狭窄、纹理与周边环境交融,以及树木、建筑、车辆和行人等干扰物导致模型产生不准确分割图。为解决上述难题,我们提出地理SAM(GeoSAM)——一种基于SAM的新型框架,通过零样本学习的密集视觉提示与预训练CNN分割模型的稀疏视觉提示实现微调策略。所提出的GeoSAM在地理图像分割领域超越现有方法:道路基础设施分割性能提升26%,行人基础设施提升7%,整体平均提升17%。这标志着利用基础模型分割地理图像中道路与行人出行基础设施领域取得的重大突破。源代码已发布于GitHub仓库:https://github.com/rafiibnsultan/GeoSAM/tree/main。