On-device LLMs have gained increasing attention for their ability to enhance privacy and provide a personalized user experience. To facilitate private learning with scarce data, Federated Learning has become a standard approach. However, it faces challenges such as computational resource heterogeneity and data heterogeneity among end users. We propose CoMiGS ($\textbf{Co}$llaborative learning with a $\textbf{Mi}$xture of $\textbf{G}$eneralists and $\textbf{S}$pecialists), the first approach to address both challenges. A key innovation of our method is the bi-level optimization formulation of the Mixture-of-Experts learning objective, where the router is optimized using a separate validation set to ensure alignment with the target distribution. We solve our objective with alternating minimization, for which we provide a theoretical analysis. Our method shares generalist experts across users while localizing a varying number of specialist experts, thereby adapting to users' computational resources and preserving privacy. Through extensive experiments, we show CoMiGS effectively balances general and personalized knowledge for each token generation. We demonstrate that CoMiGS remains robust against overfitting-due to the generalists' regularizing effect-while adapting to local data through specialist expertise. We open source our codebase for collaborative LLMs.
翻译:端侧大语言模型因其增强隐私保护与提供个性化用户体验的能力而日益受到关注。为促进数据稀缺场景下的隐私保护学习,联邦学习已成为标准方法。然而,该方法面临终端用户间计算资源异构性与数据异构性等挑战。我们提出CoMiGS(基于通用专家与专用专家混合的协同学习),这是首个同时应对这两类挑战的方法。本方法的核心创新在于将混合专家学习目标构建为双层优化问题,其中路由器通过独立验证集进行优化,以确保与目标分布对齐。我们采用交替最小化算法求解该目标,并提供了相应的理论分析。本方法在用户间共享通用专家,同时本地化部署数量可变的专用专家,从而适应用户的计算资源并保护隐私。通过大量实验,我们证明CoMiGS能在每个词元生成过程中有效平衡通用知识与个性化知识。实验表明,在通过专用专家适应本地数据的同时,CoMiGS能凭借通用专家的正则化效应保持对过拟合的鲁棒性。我们开源了协同大语言模型的代码库。