Federated Learning (FL) of Large Language Models (LLMs) in multilingual environments presents significant challenges stemming from heterogeneous language distributions across clients and disparities in language resource availability. To address these challenges, we extended the FederatedScope-LLM framework to support multilingual instruction-tuning experiments with LLMs. We also introduced a novel client-specific early stopping mechanism, Local Dynamic Early Stopping (LDES-FL), which allows clients to pause and resume local training based on client-side validation performance, enhancing training efficiency and sustainability. Through a series of experiments, we studied how client language composition - from fully monolingual to increasingly multilingual clients - affects multilingual quality, fairness and training cost. Monolingual local fine-tuning remains the most effective for single-language specialization, whereas federated training is better suited to learning a single balanced multilingual model. In FL, increasing within-client multilinguality leads to stronger and fairer global models, narrows the gap to centralized multilingual fine-tuning, and yields the largest gains for lower-resource languages, albeit at the cost of more optimization steps. Overall, our results identify client language composition as a key design variable in multilingual FL, shaping performance, fairness and efficiency.
翻译:联邦学习(FL)在多语言环境下训练大型语言模型(LLMs)面临显著挑战,这些挑战源于客户端间异构的语言分布以及语言资源可用性的差异。为解决这些问题,我们扩展了FederatedScope-LLM框架以支持LLMs的多语言指令微调实验。同时,我们提出了一种新颖的客户端特定早停机制——局部动态早停(LDES-FL),该机制允许客户端基于客户端侧验证性能暂停和恢复本地训练,从而提升训练效率与可持续性。通过一系列实验,我们研究了客户端语言组成(从完全单语到逐步多语言的客户端)如何影响多语言质量、公平性和训练成本。单语本地微调仍是单语言特化最有效的方法,而联邦训练更适合学习单一的均衡多语言模型。在联邦学习中,增加客户端内部的多语言性可产生更强更公平的全局模型,缩小与集中式多语言微调的差距,并为低资源语言带来最大收益,但代价是需要更多的优化步骤。总体而言,我们的结果将客户端语言组成确定为多语言联邦学习中的关键设计变量,它塑造了性能、公平性和效率。