GeoSAM: Fine-tuning SAM with Multi-Modal Prompts for Mobility Infrastructure Segmentation

In geographical image segmentation, performance is often constrained by the limited availability of training data and a lack of generalizability, particularly for segmenting mobility infrastructure such as roads, sidewalks, and crosswalks. Vision foundation models like the Segment Anything Model (SAM), pre-trained on millions of natural images, have demonstrated impressive zero-shot segmentation performance, providing a potential solution. However, SAM struggles with geographical images, such as aerial and satellite imagery, due to its training being confined to natural images and the narrow features and textures of these objects blending into their surroundings. To address these challenges, we propose Geographical SAM (GeoSAM), a SAM-based framework that fine-tunes SAM with automatically generated multi-modal prompts, combining point prompts from a pre-trained task-specific model as primary visual guidance and text prompts from a large language model as secondary semantic guidance to enhance model comprehension. GeoSAM outperforms existing approaches for mobility infrastructure segmentation in both familiar and completely unseen regions by at least 5\% in mIoU, representing a significant leap in leveraging foundation models to segment mobility infrastructure, including both road and pedestrian infrastructure in geographical images. The source code can be found in this GitHub Repository: https://github.com/rafiibnsultan/GeoSAM.

翻译：在地理图像分割领域，性能往往受限于训练数据的有限可用性以及泛化能力的不足，尤其是在分割道路、人行道和斑马线等交通基础设施时。基于数百万自然图像预训练的视觉基础模型，如Segment Anything Model (SAM)，已展现出卓越的零样本分割性能，为此提供了潜在的解决方案。然而，由于SAM的训练仅限于自然图像，且地理图像（如航拍和卫星影像）中目标物的特征与纹理较为狭窄、易与背景融合，导致其在该类图像上的分割效果不佳。为应对这些挑战，我们提出了地理SAM（GeoSAM），这是一个基于SAM的框架，通过自动生成的多模态提示对SAM进行微调。该框架结合了来自预训练任务特定模型生成的点提示作为主要视觉引导，以及来自大语言模型生成的文本提示作为辅助语义引导，以增强模型的理解能力。在熟悉区域和完全未见区域，GeoSAM在交通基础设施分割任务上的平均交并比（mIoU）均优于现有方法至少5%，标志着在利用基础模型分割地理图像中的交通基础设施（包括道路与行人基础设施）方面取得了显著进展。源代码可在以下GitHub仓库获取：https://github.com/rafiibnsultan/GeoSAM。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日