The continuous scaling of large language models (LLMs) incurs prohibitive computational costs, making Mixture-of-Experts (MoE) a scalable alternative for efficient fine-tuning via sparse activation. While federated learning (FL) emerges as the paradigm for privacy-preserving collaborative optimization, integrating MoE into FL under data heterogeneity may trigger conflicting expert optimizations. Client-specific data distributions force same-indexed experts to optimize under inconsistent or even conflicting feature-label correlations. This mismatch induces destructive interference during aggregation, thus destabilizing the optimization trajectory and degrading model performance. To address this issue, we propose FC-MoE, a federated conflict-aware framework for MoE fine-tuning. It employs an importance aware weighting scheme to prioritize reliable local updates and utilizes gradient consensus projection to suppress conflicting updates, ensuring a stable global optimization path. Moreover, a local knowledge retention mechanism further preserves specialized client expertise by re-anchoring domain-specific residuals. Extensive experiments demonstrate that FC-MoE accelerates convergence and enhances both global and local model performance in non-IID federated environments.
翻译:大语言模型的持续规模化带来了高昂的计算成本,这使得混合专家模型(MoE)通过稀疏激活实现高效微调成为一种可扩展的替代方案。尽管联邦学习已成为隐私保护协同优化的主流范式,但在数据异质性条件下将MoE融入联邦学习可能引发专家优化冲突。客户端特定的数据分布迫使相同索引的专家在不一致甚至矛盾的标签-特征相关性下进行优化,这种不匹配会导致聚合过程中的破坏性干扰,从而破坏优化轨迹的稳定性并降低模型性能。针对该问题,我们提出FC-MoE——一种面向MoE微调的联邦冲突感知框架。该框架采用重要性感知加权策略优先考虑可靠的局部更新,并利用梯度共识投影技术抑制冲突更新,从而确保全局优化路径的稳定性。此外,局部知识保留机制通过重新锚定领域特异性残差来保持客户端专业知识。大量实验表明,FC-MoE在非独立同分布联邦环境下能加速收敛,并显著提升全局与局部模型性能。