While current federated multimodal continual learning over mixture-of-experts low-rank adaptation (MoE-LoRA) is built on the unverified assumption that routing isolates task-specific knowledge into disjoint experts, we argue that routing operates per-sample, while forgetting accumulates across the task sequence, and gradient conflict persists within each expert even when routing is maximally polarized. Moreover, activation-subspace protection can also fail because, under parameter-efficient fine-tuning, it entangles tasks due to a dimension-counting bound, and federated averaging (FedAvg) disrupts client-side orthogonality. To address this, we propose PRISM (Per-expert Routing-projection Interference-informed Subspace Method), which maintains a per-expert gradient subspace basis whose orthogonality is preserved under FedAvg and reinterprets MoE routing as a capacity allocator. Our results show that, on LLaVA-1.5-7B, LLaVA-1.5-13B, and Qwen2.5-VL-7B across CoIN-6 and CoIN-Long-10, PRISM outperforms sixteen the state of the art baselines in average accuracy. Compared to the best federated multimodal baseline, the performance margin increases from +3.23 pp on CoIN-6 to +6.06 pp on CoIN-Long-10.
翻译:现有基于混合专家低秩适配(MoE-LoRA)的联邦多模态持续学习方法,建立在路由机制将任务特定知识隔离至互不相交专家模块这一未经验证的假设之上。我们论证:路由执行的是逐样本操作,而遗忘沿任务序列持续累积,即使在路由极化达到最大时,每个专家内部仍存在梯度冲突。此外,激活子空间保护机制也可能失效——在参数高效微调场景下,由于维度计数限制,该机制会导致任务纠缠,且联邦平均(FedAvg)会破坏客户端侧的向量正交性。为解决上述问题,我们提出PRISM(逐专家路由-投影干扰感知子空间方法),该方法维护逐专家梯度子空间基,该基可在FedAvg下保持正交性,并重新诠释MoE路由作为容量分配器的角色。实验结果表明,在LLaVA-1.5-7B、LLaVA-1.5-13B及Qwen2.5-VL-7B模型上,面对CoIN-6与CoIN-Long-10基准测试集,PRISM在平均准确率上全面超越十六个当前最优基线方法。相较于最优联邦多模态基线,性能优势从CoIN-6上的+3.23个百分点扩展至CoIN-Long-10上的+6.06个百分点。