Multimodal remote sensing data, acquired from diverse sensors, offer a comprehensive and integrated perspective of the Earth's surface. Leveraging multimodal fusion techniques, semantic segmentation enables detailed and accurate analysis of geographic scenes, surpassing single-modality approaches. Building on advancements in vision foundation models, particularly the Segment Anything Model (SAM), this study proposes a unified framework incorporating a novel Multimodal Fine-tuning Network (MFNet) for remote sensing semantic segmentation. The proposed framework is designed to seamlessly integrate with various fine-tuning mechanisms, demonstrated through the inclusion of Adapter and Low-Rank Adaptation (LoRA) as representative examples. This extensibility ensures the framework's adaptability to other emerging fine-tuning strategies, allowing models to retain SAM's general knowledge while effectively leveraging multimodal data. Additionally, a pyramid-based Deep Fusion Module (DFM) is introduced to integrate high-level geographic features across multiple scales, enhancing feature representation prior to decoding. This work also highlights SAM's robust generalization capabilities with Digital Surface Model (DSM) data, a novel application. Extensive experiments on three benchmark multimodal remote sensing datasets, ISPRS Vaihingen, ISPRS Potsdam and MMHunan, demonstrate that the proposed MFNet significantly outperforms existing methods in multimodal semantic segmentation, setting a new standard in the field while offering a versatile foundation for future research and applications. The source code for this work is accessible at https://github.com/sstary/SSRS.
翻译:多模态遥感数据通过多种传感器获取,为地球表面提供了全面而综合的视角。利用多模态融合技术,语义分割能够对地理场景进行细致且准确的分析,超越了单模态方法。基于视觉基础模型(特别是Segment Anything Model,SAM)的进展,本研究提出了一种统一框架,其中包含一种新颖的多模态微调网络(MFNet),用于遥感语义分割。该框架设计为能够无缝集成多种微调机制,通过纳入Adapter和低秩适应(LoRA)作为代表性示例进行展示。这种可扩展性确保了框架对其他新兴微调策略的适应性,使模型在有效利用多模态数据的同时,保留SAM的通用知识。此外,本文还引入了一种基于金字塔的深度融合模块(DFM),用于整合多尺度的高层地理特征,从而在解码前增强特征表示。这项工作还突出了SAM在数字表面模型(DSM)数据上的强大泛化能力,这是一种新颖的应用。在三个基准多模态遥感数据集(ISPRS Vaihingen、ISPRS Potsdam和MMHunan)上进行的大量实验表明,所提出的MFNet在多模态语义分割方面显著优于现有方法,为该领域树立了新标准,同时为未来的研究和应用提供了多功能的基础。本工作的源代码可在https://github.com/sstary/SSRS获取。