Fine-tuning Multimodal Large Language Models (MLLMs) on task-specific data is an effective way to improve performance on downstream applications. However, such adaptation often leads to a degradation in generalization on pretrained tasks, a phenomenon known as Catastrophic Forgetting. Existing methods that aim to mitigate this issue either become ineffective when fine-tuning deeper layers of the language decoder or scale poorly with increasing model size. To address these limitations, we propose Model-Dowser, a novel sparse fine-tuning approach for MLLMs. Model-Dowser measures a principled importance score for each model parameter with respect to pretrained generalization (prior to downstream adaptation) by jointly considering weight magnitudes, input activations, and output sensitivities. During fine-tuning, Model-Dowser selectively preserves high-importance parameters and updates the remaining. Comprehensive experiments on two representative MLLMs, LLaVA and NVILA, demonstrate that Model-Dowser effectively mitigates catastrophic forgetting and consistently outperforms prior methods, while remaining resource-efficient and scalable to multi-billion-parameter models.
翻译:在任务特定数据上对多模态大语言模型进行微调是提升下游应用性能的有效途径。然而,这种适应性调整通常会导致模型在预训练任务上的泛化能力下降,即灾难性遗忘现象。现有旨在缓解该问题的方法在微调语言解码器深层时往往失效,或随模型规模增大而扩展性不足。为克服这些局限,我们提出Model-Dowser——一种面向多模态大语言模型的新型稀疏微调方法。该方法通过联合考量权重幅值、输入激活与输出敏感度,为每个模型参数计算关于预训练泛化能力(在下游适应前)的理论重要性分数。在微调过程中,Model-Dowser选择性保留高重要性参数并更新其余参数。基于LLaVA和NVILA两个代表性多模态大语言模型的综合实验表明,Model-Dowser能有效缓解灾难性遗忘,持续优于现有方法,同时保持资源高效性并具备向数十亿参数模型扩展的能力。