Vision Foundation Models (VFMs) excel in generalization due to large-scale pretraining, but fine-tuning them for Domain Generalized Semantic Segmentation (DGSS) while maintaining this ability remains challenging. Existing approaches either selectively fine-tune parameters or freeze the VFMs and update only the adapters, both of which may underutilize the VFMs' full potential in DGSS tasks. We observe that domain-sensitive parameters in VFMs, arising from task and distribution differences, can hinder generalization. To address this, we propose \textbf{FisherTune}, a robust fine-tuning method guided by the Domain-Related Fisher Information Matrix (DR-FIM). DR-FIM measures parameter sensitivity across tasks and domains, enabling selective updates that preserve generalization and enhance DGSS adaptability. FisherTune incorporates variational inference to stabilize DR-FIM estimation, treating parameters as Gaussian-distributed variables and leveraging pre-trained priors. Extensive experiments show that FisherTune achieves superior cross-domain segmentation while maintaining generalization, outperforming selective-parameter and adapter-based methods.
翻译:视觉基础模型(VFMs)凭借大规模预训练展现出卓越的泛化能力,然而在领域泛化语义分割任务中对其进行微调并保持此能力仍具挑战。现有方法要么选择性微调部分参数,要么冻结VFMs仅更新适配器,二者均可能未充分利用VFMs在DGSS任务中的全部潜力。我们观察到,VFMs中因任务与分布差异产生的领域敏感参数会阻碍泛化性能。为此,我们提出\textbf{FisherTune}——一种基于领域相关Fisher信息矩阵(DR-FIM)引导的鲁棒微调方法。DR-FIM能够度量参数跨任务与跨领域的敏感性,从而通过选择性更新在保持泛化能力的同时增强DGSS适应性。FisherTune引入变分推断以稳定DR-FIM估计,将参数视为高斯分布变量并利用预训练先验。大量实验表明,FisherTune在保持泛化能力的同时实现了优异的跨领域分割性能,显著优于选择性参数更新与基于适配器的方法。