Multimodal Large Language Models (MLLMs) achieve strong performance through instruction tuning, but real-world deployment requires them to continually expand their capabilities, making Multimodal Continual Instruction Tuning (MCIT) essential. Recent methods leverage sparse expert routing to promote task specialization, but we find that the expert routing process suffers from drift as the data distribution evolves. For example, a grounding query that previously activated localization experts may instead be routed to irrelevant experts after learning OCR tasks. Meanwhile, the grounding-related experts can be overwritten by new tasks and lose their original functionality. Such failure reflects two problems: router drift, where expert selection becomes inconsistent over time, and expert drift, where shared experts are overwritten across tasks. Therefore, we propose StAbilized Mixture-of-Experts (SAME) for MCIT. To address router drift, SAME stabilizes expert selection by decomposing routing dynamics into orthogonal subspaces and updating only task-relevant directions. To mitigate expert drift, we regulate expert updates via curvature-aware scaling using historical input covariance in a rehearsal-free manner. SAME also introduces adaptive expert activation to freeze selected experts during training, reducing redundant computation and cross-task interference. Extensive experiments demonstrate its SOTA performance.
翻译:多模态大语言模型(MLLMs)通过指令调优实现了强大的性能,但实际部署要求其持续扩展能力,这使得多模态持续指令调优(MCIT)变得至关重要。现有方法利用稀疏专家路由来促进任务专业化,但我们发现专家路由过程会随着数据分布的演变而发生漂移。例如,先前激活定位专家的基础查询在学习OCR任务后可能被路由至无关专家。同时,与基础相关的专家可能被新任务覆盖并丧失原有功能。此类故障反映了两个问题:路由器漂移(即专家选择随时间变得不一致)和专家漂移(即共享专家在跨任务中被覆盖)。为此,我们提出用于MCIT的稳定专家混合模型(SAME)。针对路由器漂移,SAME通过将路由动态分解至正交子空间并仅更新任务相关方向来稳定专家选择。为缓解专家漂移,我们采用基于历史输入协方差的曲率感知缩放以无排练方式调控专家更新。SAME还引入自适应专家激活机制,在训练期间冻结选定专家以减少冗余计算和跨任务干扰。大量实验证明了其最先进的性能。