GeoSAM: Fine-tuning SAM with Sparse and Dense Visual Prompting for Automated Segmentation of Mobility Infrastructure

The Segment Anything Model (SAM) has shown impressive performance when applied to natural image segmentation. However, it struggles with geographical images like aerial and satellite imagery, especially when segmenting mobility infrastructure including roads, sidewalks, and crosswalks. This inferior performance stems from the narrow features of these objects, their textures blending into the surroundings, and interference from objects like trees, buildings, vehicles, and pedestrians - all of which can disorient the model to produce inaccurate segmentation maps. To address these challenges, we propose Geographical SAM (GeoSAM), a novel SAM-based framework that implements a fine-tuning strategy using the dense visual prompt from zero-shot learning, and the sparse visual prompt from a pre-trained CNN segmentation model. The proposed GeoSAM outperforms existing approaches for geographical image segmentation, specifically by 26%, 7%, and 17% for road infrastructure, pedestrian infrastructure, and on average, respectively, representing a momentous leap in leveraging foundation models to segment mobility infrastructure including both road and pedestrian infrastructure in geographical images. The source code can be found on this GitHub repository: https://github.com/rafiibnsultan/GeoSAM/tree/main.

翻译：分割一切模型（SAM）在自然图像分割任务中表现出色，但在处理航空和卫星等地理图像时面临挑战，尤其在对道路、人行道及斑马线等出行基础设施进行分割时表现欠佳。这一性能缺陷源于目标对象特征狭窄、纹理与周边环境交融，以及树木、建筑、车辆和行人等干扰物导致模型产生不准确分割图。为解决上述难题，我们提出地理SAM（GeoSAM）——一种基于SAM的新型框架，通过零样本学习的密集视觉提示与预训练CNN分割模型的稀疏视觉提示实现微调策略。所提出的GeoSAM在地理图像分割领域超越现有方法：道路基础设施分割性能提升26%，行人基础设施提升7%，整体平均提升17%。这标志着利用基础模型分割地理图像中道路与行人出行基础设施领域取得的重大突破。源代码已发布于GitHub仓库：https://github.com/rafiibnsultan/GeoSAM/tree/main。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日