This paper presents LOLA, a massively multilingual large language model trained on more than 160 languages using a sparse Mixture-of-Experts Transformer architecture. Our architectural and implementation choices address the challenge of harnessing linguistic diversity while maintaining efficiency and avoiding the common pitfalls of multilinguality. Our analysis of the evaluation results shows competitive performance in natural language generation and understanding tasks. Additionally, we demonstrate how the learned expert-routing mechanism exploits implicit phylogenetic linguistic patterns to potentially alleviate the curse of multilinguality. We provide an in-depth look at the training process, an analysis of the datasets, and a balanced exploration of the model's strengths and limitations. As an open-source model, LOLA promotes reproducibility and serves as a robust foundation for future research. Our findings enable the development of compute-efficient multilingual models with strong, scalable performance across languages.
翻译:本文介绍了LOLA,这是一个基于稀疏专家混合(Mixture-of-Experts)Transformer架构训练而成的大规模多语言大语言模型,覆盖超过160种语言。我们的架构设计与实现策略旨在应对利用语言多样性同时保持效率、避免多语言建模常见陷阱的挑战。对评估结果的分析表明,该模型在自然语言生成与理解任务上具有竞争力。此外,我们展示了所学得的专家路由机制如何利用隐含的语言谱系模式,以潜在地缓解多语言诅咒。我们深入探讨了训练过程、数据集分析,并平衡地探索了模型的优势与局限。作为一个开源模型,LOLA促进了研究的可复现性,并为未来研究提供了坚实的基础。我们的研究结果有助于开发计算高效、跨语言性能强大且可扩展的多语言模型。