Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

In this paper, we first assess and harness various Vision Foundation Models (VFMs) in the context of Domain Generalized Semantic Segmentation (DGSS). Driven by the motivation that Leveraging Stronger pre-trained models and Fewer trainable parameters for Superior generalizability, we introduce a robust fine-tuning approach, namely Rein, to parameter-efficiently harness VFMs for DGSS. Built upon a set of trainable tokens, each linked to distinct instances, Rein precisely refines and forwards the feature maps from each layer to the next layer within the backbone. This process produces diverse refinements for different categories within a single image. With fewer trainable parameters, Rein efficiently fine-tunes VFMs for DGSS tasks, surprisingly surpassing full parameter fine-tuning. Extensive experiments across various settings demonstrate that Rein significantly outperforms state-of-the-art methods. Remarkably, with just an extra 1% of trainable parameters within the frozen backbone, Rein achieves a mIoU of 68.1% on the Cityscapes, without accessing any real urban-scene datasets.Code is available at https://github.com/w1oves/Rein.git.

翻译：本文首先评估并利用多种视觉基础模型（Vision Foundation Models, VFMs）在域泛化语义分割（Domain Generalized Semantic Segmentation, DGSS）中的表现。受“利用更强的预训练模型和更少的可训练参数以获得更优泛化能力”这一动机驱动，我们提出了一种名为Rein的鲁棒微调方法，以参数高效的方式利用VFMs解决DGSS问题。该方法基于一组与不同实例相关联的可训练令牌（tokens），对主干网络中每一层的特征图进行精确优化，并将其传递至下一层，从而为单幅图像中的不同类别生成多样化的改进。通过更少的可训练参数，Rein高效地微调VFMs用于DGSS任务，并出人意料地超越了全参数微调的效果。在多种设置下的广泛实验表明，Rein显著优于当前最先进方法。值得注意的是，仅在冻结的主干网络中额外引入1%的可训练参数，Rein在不访问任何真实城市场景数据集的情况下，即可在Cityscapes上达到68.1%的mIoU。代码已开源在 https://github.com/w1oves/Rein.git。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日